Skip to content

OCPBUGS-83871: Support bootloader update#5868

Open
djoshy wants to merge 1 commit intoopenshift:mainfrom
djoshy:bootloader-update
Open

OCPBUGS-83871: Support bootloader update#5868
djoshy wants to merge 1 commit intoopenshift:mainfrom
djoshy:bootloader-update

Conversation

@djoshy
Copy link
Copy Markdown
Contributor

@djoshy djoshy commented Apr 21, 2026

- What I did
This PR adds support bootloader updates prior to an OS update from the MCO, based on steps described in the official OCP docs. The bootupctl command is run from the incoming container image, similar to the way we use rpm-ostree when the bootimage is too old. On failure, we fallback to attempting a blunt copy based on the update script from openshift/os#1795.

Notes:

  • This mechanism runs during the MCD's firstboot service and during a regular OS update orchestrated by the MCO, so it should cover initial scale-up and upgrade scenarios.
  • This mechanism only runs on aarch64/x86_64 nodes, it is disabled on others architecutres as secure boot isn't applicable to them.

- How to verify it

A quick test:

  1. A cluster with the following configuration will be required to test:
  • 4.22 or later cluster
  • Secure Boot is enabled
  1. Inject one of the following bootimages and scale up.
4.13: projects/rhcos-cloud/global/images/rhcos-413-92-202402131523-0-gcp-x86-64
4.14: projects/rhcos-cloud/global/images/rhcos-414-92-202402130420-0-gcp-x86-64
4.15: projects/rhcos-cloud/global/images/rhcos-415-92-202402201450-0-gcp-x86-64
  1. This should succeed and will use the blunt copy fallback path. A 9.6 or newer bootimage will use the direct copy path. The detailed logs for both paths can be found in this comment.

An upgrade test:
A longer way to test is start a secure boot cluster on one of the affected OCP versions, chain upgrade to 4.22. This should succeed as well. This comment on the bug lists the affected versions.

Summary by CodeRabbit

  • New Features

    • Attempts a container-based bootloader update before layered-OS processing.
  • Bug Fixes / Reliability

    • Bootloader update failures are logged as warnings and no longer block updates.
    • Added a host-side EFI fallback with improved ESP discovery across disks and RAID, and safer EFI deployment steps.
  • Tests

    • New tests for EFI/ESP detection and boot→ESP mapping across common disk and RAID layouts.
  • Chores

    • Updated security-scan exclusions for the new bootloader updater.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Walkthrough

Attempt a bootloader update early in updateLayeredOS by running a containerized bootupctl update; on container failure, fall back to extracting EFI from the image and performing host-side ESP detection, mounting, EFI copy, sync, and cleanup. Errors are logged and the existing layering/rebase flow continues.

Changes

Cohort / File(s) Summary
Update integration
pkg/daemon/update.go
Call runBootupdViaContainer(newURL) early in updateLayeredOS; treat failures as warnings and continue existing layering/rebase logic.
Bootloader update implementation
pkg/daemon/bootupd.go
New daemon code to run bootupctl update inside a container via systemd-run/podman with image-pull/auth handling; on failure extract /usr/lib/bootupd/updates/EFI from the image, detect ESP(s) via lsblk --json and findmnt /boot, mount ESP(s), remove orphan .btmp.*, log sha256 sums, copy EFI files, sync and cleanup.
Device/ESP unit tests
pkg/daemon/bootupd_test.go
New table-driven tests with lsblk fixtures (single-disk, multi-disk, RAID) for helpers: findDeviceByName, countDisksWithESP, findAnyESP, findESPOnBootDisk, findBootPartition, findRAIDESPs.
Snyk policy
.snyk
Add Snyk exclusion for pkg/daemon/bootupd.go targeting CWE-23 (Path Traversal).

Sequence Diagram(s)

sequenceDiagram
    participant Updater as updateLayeredOS()
    participant Daemon as runBootupdViaContainer()
    participant Systemd as systemd-run
    participant Podman as podman
    participant Container as bootupctl (in image)
    participant Host as Host FS/tools

    Updater->>Daemon: request bootloader update for imageURL
    Daemon->>Podman: check local image
    alt image missing
        Daemon->>Podman: podman pull (with authfiles)
    end
    Daemon->>Systemd: systemd-run podman run --privileged --pid=host --net=host --rm (mounts, INVOCATION_ID) bootupctl update
    Systemd->>Podman: start container
    Podman->>Container: run bootupctl update (access /boot)
    alt container run succeeds
        Container-->>Daemon: success
    else container run fails
        Container-->>Daemon: error (logged)
        Daemon->>Host: extract EFI artifacts from image to tmp
        Daemon->>Host: query `lsblk --json` and `findmnt /boot`
        Daemon->>Host: mount ESPs, remove orphan dirs, log sha256, copy EFI files, sync, unmount
    end
    Daemon-->>Updater: return (warning or nil)
Loading
sequenceDiagram
    participant Daemon as runBootupdViaContainer()
    participant Image as Image Registry/Local Store
    participant Extract as extractor (container/tar)
    participant LSBLK as lsblk --json
    participant Mount as mount/umount
    participant Copy as cp/rsync + sha256

    Daemon->>Image: ensure image available (pull if needed)
    Daemon->>Extract: extract `/usr/lib/bootupd/updates/EFI` to tmp
    Daemon->>LSBLK: query block devices and `/boot` mapping
    LSBLK-->>Daemon: device topology JSON
    Daemon->>Mount: mount candidate ESPs at `/boot/efi` (handle RAID members)
    Daemon->>Copy: remove `.btmp.*`, log sha256 checksums, copy EFI contents
    Copy->>Mount: sync
    Daemon->>Mount: unmount cleanup
    Daemon-->>Daemon: return to caller
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'OCPBUGS-83871: Support bootloader update' directly and clearly describes the main change—adding bootloader update support via bootupctl. It is concise, specific, and accurately reflects the primary objectives of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The test file contains six deterministic test functions with stable names and static test fixtures, without any dynamic information like timestamps, UUIDs, or generated suffixes.
Test Structure And Quality ✅ Passed PR adds standard Go unit tests using testing package, not Ginkgo tests. Table-driven tests follow good practices with proper error assertions and fixtures.
Microshift Test Compatibility ✅ Passed PR does not add any Ginkgo e2e tests; only standard Go unit tests using testing package.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR adds only standard Go unit tests in bootupd_test.go, not Ginkgo e2e tests. Check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces bootloader update functionality through daemon-level code changes without Kubernetes deployment manifest modifications, ensuring topology compatibility.
Ote Binary Stdout Contract ✅ Passed MCO daemon is infrastructure code running as a systemd service, not an OTE-exposed binary. No process-level entry points that could violate stdout contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR adds standard Go unit tests for device/partition detection logic with no Ginkgo e2e tests or connectivity assumptions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@djoshy djoshy changed the title [DNM] Support bootloader update [DNM] MCO-2218: Support bootloader update Apr 21, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 21, 2026

@djoshy: This pull request references MCO-2218 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2026
Comment thread pkg/daemon/update.go Outdated
Comment on lines +2795 to +2802
// -v /boot:/boot:rslave mounts the host's /boot directory at /boot inside the
// container. The :rslave ("recursive slave") propagation option is required
// because /boot/efi is a separate filesystem (the EFI System Partition) mounted
// inside /boot on the host. A plain bind mount would expose the /boot directory
// but not that nested mount; rslave propagates sub-mounts from the host into the
// container, making /boot/efi visible where bootupctl expects to read and write
// bootloader files. bootupctl has no --sysroot flag, so it always operates on
// the real /boot path.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude insisted I do this volume mount argument so I asked for an explanation. It makes sense to me - but someone more bootloader experienced than me should definitely review :)

@djoshy djoshy changed the title [DNM] MCO-2218: Support bootloader update OCPBUGS-83869: Support bootloader update Apr 21, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/daemon/update.go`:
- Around line 2808-2817: The current logic treats failures from
statusCmd.Output() and json.Unmarshal(...) as non-fatal by returning nil; change
this to propagate the error so bootloader update is enforced: when
statusCmd.Output() returns an error, return that error (or wrap it with context)
instead of logging and returning nil; likewise, on json.Unmarshal failure of
statusOut into bStatus, return the parse error (unless you explicitly detect and
handle a known "unsupported image" sentinel from bootupctl and skip only that
case). Update the error paths around statusCmd and the json.Unmarshal into
bStatus to fail-fast by returning the error with context rather than silently
continuing.
- Around line 2782-2790: The current code in updateLayeredOS unconditionally
builds pullArgs and calls runCmdSync("systemd-run", ...) which forces a network
pull and breaks disconnected/bootstrap flows; instead, first check for a
locally-available image using the existing lookup path
(call/GetPodmanImageInfoByReference or the same local-storage/PIS logic used
later) and only assemble and execute the systemd-run pull command (the block
using systemdPodmanArgs, "--authfile", and conditional "--signature-policy" via
podmanSupportsSigstore()) if the image is not present locally; if the local
image is found, skip the pull and proceed with the existing rebase/local
handling in updateLayeredOS.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e9c56b4-06d7-4574-9651-486c1637bbd9

📥 Commits

Reviewing files that changed from the base of the PR and between c3a9db7 and 07db816.

📒 Files selected for processing (1)
  • pkg/daemon/update.go

Comment thread pkg/daemon/update.go Outdated
Comment thread pkg/daemon/update.go Outdated
@openshift-ci-robot openshift-ci-robot added the jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. label Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-83869, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

Details

In response to this:

- What I did

- How to verify it

- Description for the changelog

Summary by CodeRabbit

  • New Features
  • Added bootloader update support for CoreOS systems. During layered OS updates, the system automatically detects the bootloader state and applies the appropriate update strategy. Updates proceed safely if bootloader operations fail.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Apr 21, 2026
@openshift-ci openshift-ci Bot requested a review from sergiordlr April 21, 2026 16:18
@yuqi-zhang yuqi-zhang changed the title OCPBUGS-83869: Support bootloader update OCPBUGS-83871: Support bootloader update Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-83871, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

- What I did

- How to verify it

- Description for the changelog

Summary by CodeRabbit

  • New Features
  • Added bootloader update support for CoreOS systems. During layered OS updates, the system automatically detects the bootloader state and applies the appropriate update strategy. Updates proceed safely if bootloader operations fail.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@travier
Copy link
Copy Markdown
Member

travier commented Apr 21, 2026

So I don't think that's going to work as is as bootupd mounts the ESP dynamically as needed for RAID setup so it needs access to /dev.

Comment thread pkg/daemon/update.go Outdated
@djoshy
Copy link
Copy Markdown
Contributor Author

djoshy commented Apr 21, 2026

So I don't think that's going to work as is as bootupd mounts the ESP dynamically as needed for RAID setup so it needs access to /dev.

Would mounting /dev be sufficient to fix that?

@djoshy djoshy closed this Apr 21, 2026
@djoshy djoshy force-pushed the bootloader-update branch from 07db816 to c3a9db7 Compare April 21, 2026 17:19
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-83871. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

Details

In response to this:

- What I did

- How to verify it

- Description for the changelog

Summary by CodeRabbit

  • New Features
  • Added bootloader update support for CoreOS systems. During layered OS updates, the system automatically detects the bootloader state and applies the appropriate update strategy. Updates proceed safely if bootloader operations fail.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy djoshy reopened this Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-83871, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

- What I did

- How to verify it

- Description for the changelog

Summary by CodeRabbit

  • New Features
  • Added bootloader update support for CoreOS systems. During layered OS updates, the system automatically detects the bootloader state and applies the appropriate update strategy. Updates proceed safely if bootloader operations fail.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Copy Markdown
Contributor Author

djoshy commented Apr 21, 2026

oops, accidentally closed - just reopened 😅

@travier - I've added the /dev mount in the calls, but happy to change the approach if this isn't viable.

/retest

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-83871, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

- What I did

- How to verify it

- Description for the changelog

Summary by CodeRabbit

  • New Features
  • Bootloader update support for CoreOS during layered OS updates; the system detects bootloader state and applies the appropriate update action automatically.
  • Bug Fixes
  • If bootloader status or update steps cannot be determined, the update is skipped safely with a logged warning so overall OS update proceeds.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/daemon/update.go (1)

2816-2826: Consider capturing stderr for better diagnostics.

When bootupctl status fails, stderr (which often contains the actual error message) is discarded. Using CombinedOutput() would preserve it in the warning log.

🔧 Proposed fix
 	statusCmd := exec.Command("podman", "run",
 		"--privileged", "--pid=host", "--net=host", "--rm",
 		"-v", "/boot:/boot:rslave",
 		"-v", "/dev:/dev",
 		imageURL, "bootupctl", "status", "--json",
 	)
-	statusOut, err := statusCmd.Output()
+	statusOut, err := statusCmd.CombinedOutput()
 	if err != nil {
-		klog.Warningf("bootupctl status returned error: %v; skipping bootloader update", err)
+		klog.Warningf("bootupctl status returned error: %v; output: %s; skipping bootloader update", err, string(statusOut))
 		return nil
 	}

Note: The choice to skip (return nil) rather than fail on status errors was discussed in prior review comments. If the intent is to fail-safe when bootupctl isn't available in older images, this is reasonable but should be documented in a code comment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/daemon/update.go` around lines 2816 - 2826, The current exec.Command call
that creates statusCmd and calls statusCmd.Output() discards stderr; change to
use statusCmd.CombinedOutput() (or cmd.CombinedOutput()) to capture stderr and
include that combined output in the klog.Warningf message when bootupctl status
fails so you log the real error text from bootupctl, and add a brief inline
comment near the status check explaining that returning nil (skipping update) is
intentional to be fail-safe for older images.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/daemon/update.go`:
- Around line 2816-2826: The current exec.Command call that creates statusCmd
and calls statusCmd.Output() discards stderr; change to use
statusCmd.CombinedOutput() (or cmd.CombinedOutput()) to capture stderr and
include that combined output in the klog.Warningf message when bootupctl status
fails so you log the real error text from bootupctl, and add a brief inline
comment near the status check explaining that returning nil (skipping update) is
intentional to be fail-safe for older images.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8117867b-24f9-401b-b45d-b5a499349d3e

📥 Commits

Reviewing files that changed from the base of the PR and between 07db816 and cf372ab.

📒 Files selected for processing (1)
  • pkg/daemon/update.go

@travier
Copy link
Copy Markdown
Member

travier commented Apr 21, 2026

So I don't think that's going to work as is as bootupd mounts the ESP dynamically as needed for RAID setup so it needs access to /dev.

Would mounting /dev be sufficient to fix that?

@travier - I've added the /dev mount in the calls, but happy to change the approach if this isn't viable.

Probably but I've not checked that. I would recommend running those commands on a single node to flush out the approach first to make sure we do things properly.

Comment thread pkg/daemon/update.go
klog.Infof("Updating OS to layered image %q", newURL)

if err := dn.runBootupdViaContainer(newURL); err != nil {
klog.Warningf("bootloader update failed: %s", err)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a failure to update the bootloader is currently non fatal, the MCD will proceed to do an OS update regardless.

Since the new bootloader is only critical for a secure boot cluster, this seemed like a safer choice so as not to block regular clusters. Not strongly held though, happy to discuss further. cc @yuqi-zhang @travier

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on the non-fatal path since this is mostly there as a short-term patch to try to get affected cluster nodes to be upgradeable. I'd prefer we don't fail unaffected clusters, since affected clusters will fail regardless, even if the already-rebooted case is a bit more annoying to fix.

@djoshy
Copy link
Copy Markdown
Contributor Author

djoshy commented Apr 24, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/daemon/bootupd_test.go (1)

118-136: Consider adding an integration test for findESPDevices RAID detection.

The unit test for findRAIDESPs passes /dev/md0 directly, but the actual usage in findESPDevices relies on findmnt returning the boot device and then checking bootPart.FSType == "linux_raid_member". Since the md device itself doesn't have FSType "linux_raid_member" in the test data (only the member partitions do), the RAID detection path in findESPDevices may not be exercised correctly.

Consider adding a test that validates the full findESPDevices flow with RAID configurations, potentially by mocking the runGetOut calls or refactoring to accept the lsblk/findmnt results as parameters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/daemon/bootupd_test.go` around lines 118 - 136, Add an integration test
exercising the full findESPDevices flow for RAID: mock or inject the external
command results used by findESPDevices (the runGetOut/findmnt output and
lsblk-like disk tree) so that findmnt returns a boot device whose
bootPart.FSType == "linux_raid_member" and the lsblk data contains member
partitions pointing to an md device; then call findESPDevices and assert it
returns the same ESP replicas that findRAIDESPs would for that md device. Locate
code paths in findESPDevices, findRAIDESPs and the runGetOut/findmnt usage to
implement the mocks or refactor to accept test fixtures and verify RAID
detection is actually exercised.
pkg/daemon/bootupd.go (1)

222-230: Minor: sha256sum command could fail on unusual filenames.

The find | xargs sha256sum pattern may fail on filenames with spaces or special characters. While EFI paths typically don't contain such characters, a more robust approach would use null-delimited output:

find /boot/efi -type f -print0 | xargs -0 sha256sum

However, given this is logging/debugging output and EFI paths are standardized, the current approach is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/daemon/bootupd.go` around lines 222 - 230, Replace the fragile `find
/boot/efi -type f | xargs sha256sum` usage in the calls that populate beforeSums
and afterSums with a null-delimited variant to handle filenames with
spaces/special chars: use `find /boot/efi -type f -print0 | xargs -0 sha256sum`
when invoking runGetOut (and ensure any error is still captured/preserved for
logging via logSystem); update the runGetOut invocations that set beforeSums and
afterSums so they run the safer command string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 168-176: The RAID detection misses when /boot is on an md device
because bootPart.FSType will be the filesystem (e.g., "ext4") for /dev/mdX;
update the logic around findDeviceByName, bootPart and findRAIDESPs to detect
RAID not only by FSType but also when bootDev is an md device or has a parent
with FSType "linux_raid_member": if bootDev starts with "/dev/md" or traversing
blk.BlockDevices from the md device up finds a parent with FSType
"linux_raid_member", call findRAIDESPs(blk.BlockDevices, bootDev) and handle the
empty-replicas error as shown; ensure the new condition covers both the existing
bootPart.FSType == "linux_raid_member" case and these md-parent cases so ESP
replicas on RAID member disks are updated.

---

Nitpick comments:
In `@pkg/daemon/bootupd_test.go`:
- Around line 118-136: Add an integration test exercising the full
findESPDevices flow for RAID: mock or inject the external command results used
by findESPDevices (the runGetOut/findmnt output and lsblk-like disk tree) so
that findmnt returns a boot device whose bootPart.FSType == "linux_raid_member"
and the lsblk data contains member partitions pointing to an md device; then
call findESPDevices and assert it returns the same ESP replicas that
findRAIDESPs would for that md device. Locate code paths in findESPDevices,
findRAIDESPs and the runGetOut/findmnt usage to implement the mocks or refactor
to accept test fixtures and verify RAID detection is actually exercised.

In `@pkg/daemon/bootupd.go`:
- Around line 222-230: Replace the fragile `find /boot/efi -type f | xargs
sha256sum` usage in the calls that populate beforeSums and afterSums with a
null-delimited variant to handle filenames with spaces/special chars: use `find
/boot/efi -type f -print0 | xargs -0 sha256sum` when invoking runGetOut (and
ensure any error is still captured/preserved for logging via logSystem); update
the runGetOut invocations that set beforeSums and afterSums so they run the
safer command string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c0fc055-3e23-4aad-8065-2ed307894a54

📥 Commits

Reviewing files that changed from the base of the PR and between 74d6d3d and 69f3a22.

📒 Files selected for processing (3)
  • pkg/daemon/bootupd.go
  • pkg/daemon/bootupd_test.go
  • pkg/daemon/update.go

Comment thread pkg/daemon/bootupd.go Outdated
@djoshy djoshy force-pushed the bootloader-update branch 2 times, most recently from de2477e to 923f4db Compare April 24, 2026 17:27
@djoshy
Copy link
Copy Markdown
Contributor Author

djoshy commented Apr 24, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Reviews resumed.

@djoshy djoshy force-pushed the bootloader-update branch 2 times, most recently from 1011d8f to b4f9aeb Compare April 24, 2026 17:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 205-208: The deferred umount call is currently ignored; change
copyEFIToESP (and any caller paths like runBootupdFallback) to propagate umount
failures by converting the defer into a closure that captures and sets the
function's return error (use a named return err variable) — e.g. replace defer
runCmdSync("umount", mountPoint) with defer func() { if e :=
runCmdSync("umount", mountPoint); e != nil && err == nil { err =
fmt.Errorf("unmounting %s: %w", mountPoint, e) } }() so runCmdSync("umount",
...) errors surface to callers and break the fallback sequence when unmount
fails.

In `@pkg/daemon/update.go`:
- Around line 2759-2761: The current code logs and continues when
dn.runBootupdViaContainer(newURL) fails, which risks rebasing an image the
existing Secure Boot loader can't verify; change the behavior to return the
error from the enclosing update flow instead of just logging it, i.e. replace
the klog.Warningf branch with an immediate return of the error from the caller
(propagate the error up); if you must continue on failure, only suppress/ignore
the error after explicitly confirming Secure Boot is disabled (implement or call
a check like dn.isSecureBootDisabled() or equivalent that queries
firmware/efivar and only continue when it positively reports disabled).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1510fc12-78b5-4752-b1a0-ddf8624ad6b7

📥 Commits

Reviewing files that changed from the base of the PR and between 1011d8f and b4f9aeb.

📒 Files selected for processing (3)
  • pkg/daemon/bootupd.go
  • pkg/daemon/bootupd_test.go
  • pkg/daemon/update.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/daemon/bootupd_test.go

Comment thread pkg/daemon/bootupd.go Outdated
Comment thread pkg/daemon/update.go
@djoshy djoshy force-pushed the bootloader-update branch 3 times, most recently from 2b0b3c3 to 4ab14bd Compare April 24, 2026 18:28
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 160-187: The code obtains bootDev from findmnt but compares it
directly against lsblk entries, which can be symlinked names; canonicalize the
boot device path and the lsblk device paths before matching to ensure aliases
(e.g., /dev/disk/by-*) resolve to the actual block device. Modify the logic
around findBootPartition and findESPOnBootDisk to resolve bootDev (use
filepath.EvalSymlinks or equivalent) and normalize each blk.BlockDevices
entry/device.Path before doing equality or parent-child checks so lookups
succeed for symlinked or aliased device names.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b7d4f3a3-e895-42ec-b3e4-5c130a4ba6b6

📥 Commits

Reviewing files that changed from the base of the PR and between b4f9aeb and 4ab14bd.

📒 Files selected for processing (4)
  • .snyk
  • pkg/daemon/bootupd.go
  • pkg/daemon/bootupd_test.go
  • pkg/daemon/update.go
✅ Files skipped from review due to trivial changes (1)
  • .snyk
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/daemon/bootupd_test.go
  • pkg/daemon/update.go

Comment thread pkg/daemon/bootupd.go
@djoshy djoshy force-pushed the bootloader-update branch from 4ab14bd to d5dd9b4 Compare April 24, 2026 18:46
@djoshy
Copy link
Copy Markdown
Contributor Author

djoshy commented Apr 24, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/daemon/bootupd.go (1)

147-202: Extract the ESP selection logic into a pure helper.

The risky behavior here is the orchestration between lsblk, findmnt, symlink resolution, and the RAID/multi-disk branching, but the new tests only cover the leaf helpers. Pulling the decision logic behind Line 147 into something like selectESPDevices(blockDevices, bootDev) would make the actual fallback selection path unit-testable instead of only testing pieces of it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/daemon/bootupd.go` around lines 147 - 202, The ESP selection and
branching logic inside findESPDevices should be extracted into a pure helper
function selectESPDevices(blockDevices []BlockDevice, bootDev string) ([]string,
error); move the decision code that starts from computing bootPart :=
findBootPartition(...) through the final findAnyESP(...) return into that new
function, keeping use of the existing helpers (findBootPartition, findRAIDESPs,
countDisksWithESP, findESPOnBootDisk, findAnyESP), and have findESPDevices call
selectESPDevices after resolving bootDev and parsing lsblk output so the
selection logic becomes unit-testable without running lsblk/findmnt.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 109-115: The hard-coded container name "bootupd-efi-extract" in
runBootupdFallback causes collisions; generate a unique container name (e.g.,
append a timestamp or short UUID) before calling
dn.podmanInterface.CreatePodmanContainer and pass that variable instead of the
literal, use the returned containerID as you do now, and change the deferred
cleanup (runCmdSync("podman", "rm", "-f", ...)) to remove either the containerID
or the unique name to ensure stale containers don't block future runs; update
the references in runBootupdFallback (containerOut, containerID, and the defer)
accordingly.

---

Nitpick comments:
In `@pkg/daemon/bootupd.go`:
- Around line 147-202: The ESP selection and branching logic inside
findESPDevices should be extracted into a pure helper function
selectESPDevices(blockDevices []BlockDevice, bootDev string) ([]string, error);
move the decision code that starts from computing bootPart :=
findBootPartition(...) through the final findAnyESP(...) return into that new
function, keeping use of the existing helpers (findBootPartition, findRAIDESPs,
countDisksWithESP, findESPOnBootDisk, findAnyESP), and have findESPDevices call
selectESPDevices after resolving bootDev and parsing lsblk output so the
selection logic becomes unit-testable without running lsblk/findmnt.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2230788f-fb9e-4516-8c55-5bfa339ab7e6

📥 Commits

Reviewing files that changed from the base of the PR and between 4ab14bd and d5dd9b4.

📒 Files selected for processing (4)
  • .snyk
  • pkg/daemon/bootupd.go
  • pkg/daemon/bootupd_test.go
  • pkg/daemon/update.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • .snyk
  • pkg/daemon/update.go

Comment thread pkg/daemon/bootupd.go
@djoshy
Copy link
Copy Markdown
Contributor Author

djoshy commented Apr 27, 2026

/pipeline required

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2
/test e2e-gcp-op-single-node
/test e2e-hypershift

@sergiordlr
Copy link
Copy Markdown
Contributor

sergiordlr commented Apr 28, 2026

Verification. To verify that the new nodes can be created we will use these images

4.14: projects/rhcos-cloud/global/images/rhcos-414-92-202402130420-0-gcp-x86-64
4.15: projects/rhcos-cloud/global/images/rhcos-415-92-202402201450-0-gcp-x86-64
4.13: projects/rhcos-cloud/global/images/rhcos-413-92-202402131523-0-gcp-x86-64
4.17: projects/rhcos-cloud/global/images/rhcos-417-94-202601272318-0-gcp-x86-64

In an IPI on GCP secureboot connected cluster without proxy

We were able to create new nodes using the list 4.17, 4.15, 4.14 and 4.13 images. With confidentialCompute: AMDEncryptedVirtualizationNestedPaging and confidentialCompute: Enabled.

4.13 images only support confidentialCompute: Enabled, that's what it shows as failed when we use AMDEncryptedVirtualizationNestedPaging.

$ oc get machine -n openshift-machine-api
....
ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-13-cc-wljj5   Running   n2d-standard-2   us-central1   us-central1-a   9m19s
ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-13-w5hck      Failed                                                   19m
ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs   Running   n2d-standard-2   us-central1   us-central1-a   9m19s
ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cvldq      Running   n2d-standard-2   us-central1   us-central1-a   19m
ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-15-cc-4z922   Running   n2d-standard-2   us-central1   us-central1-a   9m19s
ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-15-kscbz      Running   n2d-standard-2   us-central1   us-central1-a   19m
ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-17-cc-xq2gw   Running   n2d-standard-2   us-central1   us-central1-a   9m19s
ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-17-txkk9      Running   n2d-standard-2   us-central1   us-central1-a   19m

In the machine-config-daemon-firstboot.service we can see the bootloader logic

Tue 2026-04-28 14:02:56 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2512]: I0428 14:02:56.443687    2514 update.go:2956] "runBootupdViaContainer: attempting bootlo
ader update"
Tue 2026-04-28 14:02:56 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2512]: I0428 14:02:56.445860    2514 command_runner.go:24] Running captured: podman images --fo
rmat=json --filter reference=registry.build10.ci.openshift.org/ci-ln-vgwfnw2/stable@sha256:241ad6ff677c9aa2e51e8bfce9edbf018c5b185788c07d0e866ea1c6b80db5d3
Tue 2026-04-28 14:02:56 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2493]: I0428 14:02:56.445860    2514 command_runner.go:24] Running captured: podman images --fo
rmat=json --filter reference=registry.build10.ci.openshift.org/ci-ln-vgwfnw2/stable@sha256:241ad6ff677c9aa2e51e8bfce9edbf018c5b185788c07d0e866ea1c6b80db5d3
Tue 2026-04-28 14:02:56 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2512]: I0428 14:02:56.503012    2514 update.go:2911] Running: systemd-run --unit machine-config
-daemon-bootupd -p EnvironmentFile=-/etc/mco/proxy.env --collect --wait -- podman pull --authfile /var/lib/kubelet/config.json registry.build10.ci.openshift.org/ci-ln-vgwfnw2/stable@sha256:241ad6ff677c9aa2e51e8b
fce9edbf018c5b185788c07d0e866ea1c6b80db5d3
Tue 2026-04-28 14:02:56 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2493]: I0428 14:02:56.503012    2514 update.go:2911] Running: systemd-run --unit machine-config
-daemon-bootupd -p EnvironmentFile=-/etc/mco/proxy.env --collect --wait -- podman pull --authfile /var/lib/kubelet/config.json registry.build10.ci.openshift.org/ci-ln-vgwfnw2/stable@sha256:241ad6ff677c9aa2e51e8b
fce9edbf018c5b185788c07d0e866ea1c6b80db5d3
Tue 2026-04-28 14:05:41 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2512]: I0428 14:05:41.549737    2514 update.go:2956] "bootupctl: running update from container 
image"
Tue 2026-04-28 14:05:41 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2493]: I0428 14:05:41.549737    2514 update.go:2956] "bootupctl: running update from container 
image"
Tue 2026-04-28 14:05:41 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2512]: I0428 14:05:41.552297    2514 update.go:2911] Running: systemd-run --unit machine-config
-daemon-bootupd-update -p EnvironmentFile=-/etc/mco/proxy.env --collect --wait -- podman run --env INVOCATION_ID=aa143221-4869-450d-bd9a-8de33f6a0b55 --privileged --pid=host --net=host --rm -v /boot:/boot:rslave
 -v /dev:/dev registry.build10.ci.openshift.org/ci-ln-vgwfnw2/stable@sha256:241ad6ff677c9aa2e51e8bfce9edbf018c5b185788c07d0e866ea1c6b80db5d3 bootupctl update
Tue 2026-04-28 14:05:41 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2493]: I0428 14:05:41.552297    2514 update.go:2911] Running: systemd-run --unit machine-config
-daemon-bootupd-update -p EnvironmentFile=-/etc/mco/proxy.env --collect --wait -- podman run --env INVOCATION_ID=aa143221-4869-450d-bd9a-8de33f6a0b55 --privileged --pid=host --net=host --rm -v /boot:/boot:rslave
 -v /dev:/dev registry.build10.ci.openshift.org/ci-ln-vgwfnw2/stable@sha256:241ad6ff677c9aa2e51e8bfce9edbf018c5b185788c07d0e866ea1c6b80db5d3 bootupctl update
Tue 2026-04-28 14:05:41 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2512]: I0428 14:05:41.931048    2514 update.go:2956] "bootupctl update failed (error running sy
stemd-run --unit machine-config-daemon-bootupd-update -p EnvironmentFile=-/etc/mco/proxy.env --collect --wait -- podman run --env INVOCATION_ID=aa143221-4869-450d-bd9a-8de33f6a0b55 --privileged --pid=host --net=
host --rm -v /boot:/boot:rslave -v /dev:/dev registry.build10.ci.openshift.org/ci-ln-vgwfnw2/stable@sha256:241ad6ff677c9aa2e51e8bfce9edbf018c5b185788c07d0e866ea1c6b80db5d3 bootupctl update: Running as unit: mach
ine-config-daemon-bootupd-update.service\nFinished with result: exit-code\nMain processes terminated with: code=exited/status=1\nService runtime: 368ms\nCPU time consumed: 202ms\n: exit status 1); falling back t
o manual EFI copy"
Tue 2026-04-28 14:05:41 UTC ci-op-fjpmzfqt-ef7bc-7knf2-worker-4-14-cc-wmjrs machine-config-daemon-firstboot.service[2493]: I0428 14:05:41.931048    2514 updat

In a disconnecte IPI on GCP secureboot cluster without proxy

Same results, all nodes were created without problems.

@djoshy djoshy force-pushed the bootloader-update branch 2 times, most recently from 7ae77f4 to 8065acf Compare April 28, 2026 17:29
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/daemon/bootupd_test.go (1)

80-98: Add a non-empty “no ESP” boundary case for ESP helpers.

Line 81 and Line 90 currently validate empty/non-empty happy paths, but not a non-empty tree with zero ESP partitions. Adding that case would harden regression coverage.

✅ Suggested test additions
 func TestCountDisksWithESP(t *testing.T) {
 	if n := countDisksWithESP(singleDiskTree); n != 1 {
 		t.Errorf("single disk: expected 1, got %d", n)
 	}
 	if n := countDisksWithESP(multiDiskTree); n != 2 {
 		t.Errorf("multi disk: expected 2, got %d", n)
 	}
+	noESPTree := []lsblkDevice{
+		{Name: "/dev/vda", Children: []lsblkDevice{
+			{Name: "/dev/vda1", PartType: "0fc63daf-8483-4772-8e79-3d69d8477de4", FSType: "ext4"},
+		}},
+	}
+	if n := countDisksWithESP(noESPTree); n != 0 {
+		t.Errorf("no esp: expected 0, got %d", n)
+	}
 }
 
 func TestFindAnyESP(t *testing.T) {
 	esp := findAnyESP(singleDiskTree)
 	if esp != "/dev/sda2" {
 		t.Errorf("expected /dev/sda2, got %q", esp)
 	}
 
+	noESPTree := []lsblkDevice{
+		{Name: "/dev/vda", Children: []lsblkDevice{
+			{Name: "/dev/vda1", PartType: "0fc63daf-8483-4772-8e79-3d69d8477de4", FSType: "ext4"},
+		}},
+	}
+	if esp := findAnyESP(noESPTree); esp != "" {
+		t.Errorf("expected empty for tree without ESP, got %q", esp)
+	}
+
 	if esp := findAnyESP([]lsblkDevice{}); esp != "" {
 		t.Errorf("expected empty for no devices, got %q", esp)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/daemon/bootupd_test.go` around lines 80 - 98, Add a non-empty tree with
zero ESP partitions to both tests: in TestCountDisksWithESP create a fixture
(e.g., a slice of lsblkDevice describing disks/partitions none of which have
PARTLABEL/TYPE indicating an EFI System Partition) and assert
countDisksWithESP(...) returns 0; in TestFindAnyESP add the same non-empty,
no-ESP fixture and assert findAnyESP(...) returns an empty string. Reference the
existing tests TestCountDisksWithESP and TestFindAnyESP and the helper functions
countDisksWithESP and findAnyESP when adding the new assertions so the boundary
case of a non-empty tree with no ESP is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/daemon/bootupd_test.go`:
- Around line 80-98: Add a non-empty tree with zero ESP partitions to both
tests: in TestCountDisksWithESP create a fixture (e.g., a slice of lsblkDevice
describing disks/partitions none of which have PARTLABEL/TYPE indicating an EFI
System Partition) and assert countDisksWithESP(...) returns 0; in TestFindAnyESP
add the same non-empty, no-ESP fixture and assert findAnyESP(...) returns an
empty string. Reference the existing tests TestCountDisksWithESP and
TestFindAnyESP and the helper functions countDisksWithESP and findAnyESP when
adding the new assertions so the boundary case of a non-empty tree with no ESP
is covered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d0199b17-95b2-4408-b6f1-e430a8514cb9

📥 Commits

Reviewing files that changed from the base of the PR and between d5dd9b4 and 8065acf.

📒 Files selected for processing (4)
  • .snyk
  • pkg/daemon/bootupd.go
  • pkg/daemon/bootupd_test.go
  • pkg/daemon/update.go
✅ Files skipped from review due to trivial changes (2)
  • .snyk
  • pkg/daemon/update.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/daemon/bootupd.go

@djoshy djoshy force-pushed the bootloader-update branch from 8065acf to 2d1609e Compare April 28, 2026 17:47
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/daemon/bootupd.go (1)

115-120: ⚠️ Potential issue | 🟠 Major

Use a per-run extraction container name.

runBootupdFallback() still creates bootupd-efi-extract with a fixed name while cleanup is best-effort. If the daemon exits before the defer runs, or podman rm -f fails, the next fallback will collide at create time and updateLayeredOS() will only warn and continue, leaving later upgrades without a bootloader update.

Suggested change
 func (dn *Daemon) runBootupdFallback(imageURL string) error {
-	containerOut, err := dn.podmanInterface.CreatePodmanContainer(nil, "bootupd-efi-extract", imageURL)
+	containerName := "bootupd-efi-extract-" + string(uuid.NewUUID())
+	containerOut, err := dn.podmanInterface.CreatePodmanContainer(nil, containerName, imageURL)
 	if err != nil {
 		return fmt.Errorf("creating container for EFI extraction: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/daemon/bootupd.go` around lines 115 - 120, runBootupdFallback currently
uses a fixed container name "bootupd-efi-extract" which can collide between
runs; change it to generate a unique per-run name (e.g., append a UUID or
timestamp) before calling dn.podmanInterface.CreatePodmanContainer, pass that
generatedName instead of the literal, assign containerID from containerOut as
before, and use that same variable in the deferred cleanup call to
runCmdSync("podman", "rm", "-f", containerID) so cleanup targets the correct
per-run container and avoids create-time collisions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/daemon/bootupd.go`:
- Around line 115-120: runBootupdFallback currently uses a fixed container name
"bootupd-efi-extract" which can collide between runs; change it to generate a
unique per-run name (e.g., append a UUID or timestamp) before calling
dn.podmanInterface.CreatePodmanContainer, pass that generatedName instead of the
literal, assign containerID from containerOut as before, and use that same
variable in the deferred cleanup call to runCmdSync("podman", "rm", "-f",
containerID) so cleanup targets the correct per-run container and avoids
create-time collisions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 84f19423-50e0-471a-9cbf-8a997efd98a5

📥 Commits

Reviewing files that changed from the base of the PR and between 8065acf and 2d1609e.

📒 Files selected for processing (4)
  • .snyk
  • pkg/daemon/bootupd.go
  • pkg/daemon/bootupd_test.go
  • pkg/daemon/update.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • .snyk
  • pkg/daemon/update.go
  • pkg/daemon/bootupd_test.go

@djoshy
Copy link
Copy Markdown
Contributor Author

djoshy commented Apr 28, 2026

Added a couple fixes based on Sergio's testing in image mode:

  • Use the internal registry secret file while invoking the podman pull, regular kubelet auth file would not be sufficient if the node is targeting an internal registry image
  • Do not exit early when local storage check fails

@sergiordlr plans to re-test these scenarios and a few more tomorrow.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 28, 2026

@djoshy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl 42056b2 link false /test e2e-gcp-op-ocl
ci/prow/images 2d1609e link true /test images
ci/prow/okd-scos-images 2d1609e link true /test okd-scos-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants