OCPBUGS-83871: Support bootloader update#5868
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
WalkthroughAttempt a bootloader update early in updateLayeredOS by running a containerized Changes
Sequence Diagram(s)sequenceDiagram
participant Updater as updateLayeredOS()
participant Daemon as runBootupdViaContainer()
participant Systemd as systemd-run
participant Podman as podman
participant Container as bootupctl (in image)
participant Host as Host FS/tools
Updater->>Daemon: request bootloader update for imageURL
Daemon->>Podman: check local image
alt image missing
Daemon->>Podman: podman pull (with authfiles)
end
Daemon->>Systemd: systemd-run podman run --privileged --pid=host --net=host --rm (mounts, INVOCATION_ID) bootupctl update
Systemd->>Podman: start container
Podman->>Container: run bootupctl update (access /boot)
alt container run succeeds
Container-->>Daemon: success
else container run fails
Container-->>Daemon: error (logged)
Daemon->>Host: extract EFI artifacts from image to tmp
Daemon->>Host: query `lsblk --json` and `findmnt /boot`
Daemon->>Host: mount ESPs, remove orphan dirs, log sha256, copy EFI files, sync, unmount
end
Daemon-->>Updater: return (warning or nil)
sequenceDiagram
participant Daemon as runBootupdViaContainer()
participant Image as Image Registry/Local Store
participant Extract as extractor (container/tar)
participant LSBLK as lsblk --json
participant Mount as mount/umount
participant Copy as cp/rsync + sha256
Daemon->>Image: ensure image available (pull if needed)
Daemon->>Extract: extract `/usr/lib/bootupd/updates/EFI` to tmp
Daemon->>LSBLK: query block devices and `/boot` mapping
LSBLK-->>Daemon: device topology JSON
Daemon->>Mount: mount candidate ESPs at `/boot/efi` (handle RAID members)
Daemon->>Copy: remove `.btmp.*`, log sha256 checksums, copy EFI contents
Copy->>Mount: sync
Daemon->>Mount: unmount cleanup
Daemon-->>Daemon: return to caller
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
|
@djoshy: This pull request references MCO-2218 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| // -v /boot:/boot:rslave mounts the host's /boot directory at /boot inside the | ||
| // container. The :rslave ("recursive slave") propagation option is required | ||
| // because /boot/efi is a separate filesystem (the EFI System Partition) mounted | ||
| // inside /boot on the host. A plain bind mount would expose the /boot directory | ||
| // but not that nested mount; rslave propagates sub-mounts from the host into the | ||
| // container, making /boot/efi visible where bootupctl expects to read and write | ||
| // bootloader files. bootupctl has no --sysroot flag, so it always operates on | ||
| // the real /boot path. |
There was a problem hiding this comment.
Claude insisted I do this volume mount argument so I asked for an explanation. It makes sense to me - but someone more bootloader experienced than me should definitely review :)
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/daemon/update.go`:
- Around line 2808-2817: The current logic treats failures from
statusCmd.Output() and json.Unmarshal(...) as non-fatal by returning nil; change
this to propagate the error so bootloader update is enforced: when
statusCmd.Output() returns an error, return that error (or wrap it with context)
instead of logging and returning nil; likewise, on json.Unmarshal failure of
statusOut into bStatus, return the parse error (unless you explicitly detect and
handle a known "unsupported image" sentinel from bootupctl and skip only that
case). Update the error paths around statusCmd and the json.Unmarshal into
bStatus to fail-fast by returning the error with context rather than silently
continuing.
- Around line 2782-2790: The current code in updateLayeredOS unconditionally
builds pullArgs and calls runCmdSync("systemd-run", ...) which forces a network
pull and breaks disconnected/bootstrap flows; instead, first check for a
locally-available image using the existing lookup path
(call/GetPodmanImageInfoByReference or the same local-storage/PIS logic used
later) and only assemble and execute the systemd-run pull command (the block
using systemdPodmanArgs, "--authfile", and conditional "--signature-policy" via
podmanSupportsSigstore()) if the image is not present locally; if the local
image is found, skip the pull and proceed with the existing rebase/local
handling in updateLayeredOS.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 8e9c56b4-06d7-4574-9651-486c1637bbd9
📒 Files selected for processing (1)
pkg/daemon/update.go
|
@djoshy: This pull request references Jira Issue OCPBUGS-83869, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: This pull request references Jira Issue OCPBUGS-83871, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
So I don't think that's going to work as is as bootupd mounts the ESP dynamically as needed for RAID setup so it needs access to |
Would mounting /dev be sufficient to fix that? |
07db816 to
c3a9db7
Compare
|
@djoshy: This pull request references Jira Issue OCPBUGS-83871. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: This pull request references Jira Issue OCPBUGS-83871, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
oops, accidentally closed - just reopened 😅 @travier - I've added the /dev mount in the calls, but happy to change the approach if this isn't viable. /retest |
|
@djoshy: This pull request references Jira Issue OCPBUGS-83871, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/daemon/update.go (1)
2816-2826: Consider capturing stderr for better diagnostics.When
bootupctl statusfails, stderr (which often contains the actual error message) is discarded. UsingCombinedOutput()would preserve it in the warning log.🔧 Proposed fix
statusCmd := exec.Command("podman", "run", "--privileged", "--pid=host", "--net=host", "--rm", "-v", "/boot:/boot:rslave", "-v", "/dev:/dev", imageURL, "bootupctl", "status", "--json", ) - statusOut, err := statusCmd.Output() + statusOut, err := statusCmd.CombinedOutput() if err != nil { - klog.Warningf("bootupctl status returned error: %v; skipping bootloader update", err) + klog.Warningf("bootupctl status returned error: %v; output: %s; skipping bootloader update", err, string(statusOut)) return nil }Note: The choice to skip (return nil) rather than fail on status errors was discussed in prior review comments. If the intent is to fail-safe when bootupctl isn't available in older images, this is reasonable but should be documented in a code comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/daemon/update.go` around lines 2816 - 2826, The current exec.Command call that creates statusCmd and calls statusCmd.Output() discards stderr; change to use statusCmd.CombinedOutput() (or cmd.CombinedOutput()) to capture stderr and include that combined output in the klog.Warningf message when bootupctl status fails so you log the real error text from bootupctl, and add a brief inline comment near the status check explaining that returning nil (skipping update) is intentional to be fail-safe for older images.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/daemon/update.go`:
- Around line 2816-2826: The current exec.Command call that creates statusCmd
and calls statusCmd.Output() discards stderr; change to use
statusCmd.CombinedOutput() (or cmd.CombinedOutput()) to capture stderr and
include that combined output in the klog.Warningf message when bootupctl status
fails so you log the real error text from bootupctl, and add a brief inline
comment near the status check explaining that returning nil (skipping update) is
intentional to be fail-safe for older images.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 8117867b-24f9-401b-b45d-b5a499349d3e
📒 Files selected for processing (1)
pkg/daemon/update.go
Probably but I've not checked that. I would recommend running those commands on a single node to flush out the approach first to make sure we do things properly. |
| klog.Infof("Updating OS to layered image %q", newURL) | ||
|
|
||
| if err := dn.runBootupdViaContainer(newURL); err != nil { | ||
| klog.Warningf("bootloader update failed: %s", err) |
There was a problem hiding this comment.
Note that a failure to update the bootloader is currently non fatal, the MCD will proceed to do an OS update regardless.
Since the new bootloader is only critical for a secure boot cluster, this seemed like a safer choice so as not to block regular clusters. Not strongly held though, happy to discuss further. cc @yuqi-zhang @travier
There was a problem hiding this comment.
I'm +1 on the non-fatal path since this is mostly there as a short-term patch to try to get affected cluster nodes to be upgradeable. I'd prefer we don't fail unaffected clusters, since affected clusters will fail regardless, even if the already-rebooted case is a bit more annoying to fix.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/daemon/bootupd_test.go (1)
118-136: Consider adding an integration test forfindESPDevicesRAID detection.The unit test for
findRAIDESPspasses/dev/md0directly, but the actual usage infindESPDevicesrelies onfindmntreturning the boot device and then checkingbootPart.FSType == "linux_raid_member". Since the md device itself doesn't have FSType "linux_raid_member" in the test data (only the member partitions do), the RAID detection path infindESPDevicesmay not be exercised correctly.Consider adding a test that validates the full
findESPDevicesflow with RAID configurations, potentially by mocking therunGetOutcalls or refactoring to accept the lsblk/findmnt results as parameters.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/daemon/bootupd_test.go` around lines 118 - 136, Add an integration test exercising the full findESPDevices flow for RAID: mock or inject the external command results used by findESPDevices (the runGetOut/findmnt output and lsblk-like disk tree) so that findmnt returns a boot device whose bootPart.FSType == "linux_raid_member" and the lsblk data contains member partitions pointing to an md device; then call findESPDevices and assert it returns the same ESP replicas that findRAIDESPs would for that md device. Locate code paths in findESPDevices, findRAIDESPs and the runGetOut/findmnt usage to implement the mocks or refactor to accept test fixtures and verify RAID detection is actually exercised.pkg/daemon/bootupd.go (1)
222-230: Minor: sha256sum command could fail on unusual filenames.The
find | xargs sha256sumpattern may fail on filenames with spaces or special characters. While EFI paths typically don't contain such characters, a more robust approach would use null-delimited output:find /boot/efi -type f -print0 | xargs -0 sha256sumHowever, given this is logging/debugging output and EFI paths are standardized, the current approach is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/daemon/bootupd.go` around lines 222 - 230, Replace the fragile `find /boot/efi -type f | xargs sha256sum` usage in the calls that populate beforeSums and afterSums with a null-delimited variant to handle filenames with spaces/special chars: use `find /boot/efi -type f -print0 | xargs -0 sha256sum` when invoking runGetOut (and ensure any error is still captured/preserved for logging via logSystem); update the runGetOut invocations that set beforeSums and afterSums so they run the safer command string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 168-176: The RAID detection misses when /boot is on an md device
because bootPart.FSType will be the filesystem (e.g., "ext4") for /dev/mdX;
update the logic around findDeviceByName, bootPart and findRAIDESPs to detect
RAID not only by FSType but also when bootDev is an md device or has a parent
with FSType "linux_raid_member": if bootDev starts with "/dev/md" or traversing
blk.BlockDevices from the md device up finds a parent with FSType
"linux_raid_member", call findRAIDESPs(blk.BlockDevices, bootDev) and handle the
empty-replicas error as shown; ensure the new condition covers both the existing
bootPart.FSType == "linux_raid_member" case and these md-parent cases so ESP
replicas on RAID member disks are updated.
---
Nitpick comments:
In `@pkg/daemon/bootupd_test.go`:
- Around line 118-136: Add an integration test exercising the full
findESPDevices flow for RAID: mock or inject the external command results used
by findESPDevices (the runGetOut/findmnt output and lsblk-like disk tree) so
that findmnt returns a boot device whose bootPart.FSType == "linux_raid_member"
and the lsblk data contains member partitions pointing to an md device; then
call findESPDevices and assert it returns the same ESP replicas that
findRAIDESPs would for that md device. Locate code paths in findESPDevices,
findRAIDESPs and the runGetOut/findmnt usage to implement the mocks or refactor
to accept test fixtures and verify RAID detection is actually exercised.
In `@pkg/daemon/bootupd.go`:
- Around line 222-230: Replace the fragile `find /boot/efi -type f | xargs
sha256sum` usage in the calls that populate beforeSums and afterSums with a
null-delimited variant to handle filenames with spaces/special chars: use `find
/boot/efi -type f -print0 | xargs -0 sha256sum` when invoking runGetOut (and
ensure any error is still captured/preserved for logging via logSystem); update
the runGetOut invocations that set beforeSums and afterSums so they run the
safer command string.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5c0fc055-3e23-4aad-8065-2ed307894a54
📒 Files selected for processing (3)
pkg/daemon/bootupd.gopkg/daemon/bootupd_test.gopkg/daemon/update.go
de2477e to
923f4db
Compare
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
1011d8f to
b4f9aeb
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 205-208: The deferred umount call is currently ignored; change
copyEFIToESP (and any caller paths like runBootupdFallback) to propagate umount
failures by converting the defer into a closure that captures and sets the
function's return error (use a named return err variable) — e.g. replace defer
runCmdSync("umount", mountPoint) with defer func() { if e :=
runCmdSync("umount", mountPoint); e != nil && err == nil { err =
fmt.Errorf("unmounting %s: %w", mountPoint, e) } }() so runCmdSync("umount",
...) errors surface to callers and break the fallback sequence when unmount
fails.
In `@pkg/daemon/update.go`:
- Around line 2759-2761: The current code logs and continues when
dn.runBootupdViaContainer(newURL) fails, which risks rebasing an image the
existing Secure Boot loader can't verify; change the behavior to return the
error from the enclosing update flow instead of just logging it, i.e. replace
the klog.Warningf branch with an immediate return of the error from the caller
(propagate the error up); if you must continue on failure, only suppress/ignore
the error after explicitly confirming Secure Boot is disabled (implement or call
a check like dn.isSecureBootDisabled() or equivalent that queries
firmware/efivar and only continue when it positively reports disabled).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1510fc12-78b5-4752-b1a0-ddf8624ad6b7
📒 Files selected for processing (3)
pkg/daemon/bootupd.gopkg/daemon/bootupd_test.gopkg/daemon/update.go
✅ Files skipped from review due to trivial changes (1)
- pkg/daemon/bootupd_test.go
2b0b3c3 to
4ab14bd
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 160-187: The code obtains bootDev from findmnt but compares it
directly against lsblk entries, which can be symlinked names; canonicalize the
boot device path and the lsblk device paths before matching to ensure aliases
(e.g., /dev/disk/by-*) resolve to the actual block device. Modify the logic
around findBootPartition and findESPOnBootDisk to resolve bootDev (use
filepath.EvalSymlinks or equivalent) and normalize each blk.BlockDevices
entry/device.Path before doing equality or parent-child checks so lookups
succeed for symlinked or aliased device names.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b7d4f3a3-e895-42ec-b3e4-5c130a4ba6b6
📒 Files selected for processing (4)
.snykpkg/daemon/bootupd.gopkg/daemon/bootupd_test.gopkg/daemon/update.go
✅ Files skipped from review due to trivial changes (1)
- .snyk
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/daemon/bootupd_test.go
- pkg/daemon/update.go
4ab14bd to
d5dd9b4
Compare
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/daemon/bootupd.go (1)
147-202: Extract the ESP selection logic into a pure helper.The risky behavior here is the orchestration between
lsblk,findmnt, symlink resolution, and the RAID/multi-disk branching, but the new tests only cover the leaf helpers. Pulling the decision logic behind Line 147 into something likeselectESPDevices(blockDevices, bootDev)would make the actual fallback selection path unit-testable instead of only testing pieces of it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/daemon/bootupd.go` around lines 147 - 202, The ESP selection and branching logic inside findESPDevices should be extracted into a pure helper function selectESPDevices(blockDevices []BlockDevice, bootDev string) ([]string, error); move the decision code that starts from computing bootPart := findBootPartition(...) through the final findAnyESP(...) return into that new function, keeping use of the existing helpers (findBootPartition, findRAIDESPs, countDisksWithESP, findESPOnBootDisk, findAnyESP), and have findESPDevices call selectESPDevices after resolving bootDev and parsing lsblk output so the selection logic becomes unit-testable without running lsblk/findmnt.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 109-115: The hard-coded container name "bootupd-efi-extract" in
runBootupdFallback causes collisions; generate a unique container name (e.g.,
append a timestamp or short UUID) before calling
dn.podmanInterface.CreatePodmanContainer and pass that variable instead of the
literal, use the returned containerID as you do now, and change the deferred
cleanup (runCmdSync("podman", "rm", "-f", ...)) to remove either the containerID
or the unique name to ensure stale containers don't block future runs; update
the references in runBootupdFallback (containerOut, containerID, and the defer)
accordingly.
---
Nitpick comments:
In `@pkg/daemon/bootupd.go`:
- Around line 147-202: The ESP selection and branching logic inside
findESPDevices should be extracted into a pure helper function
selectESPDevices(blockDevices []BlockDevice, bootDev string) ([]string, error);
move the decision code that starts from computing bootPart :=
findBootPartition(...) through the final findAnyESP(...) return into that new
function, keeping use of the existing helpers (findBootPartition, findRAIDESPs,
countDisksWithESP, findESPOnBootDisk, findAnyESP), and have findESPDevices call
selectESPDevices after resolving bootDev and parsing lsblk output so the
selection logic becomes unit-testable without running lsblk/findmnt.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2230788f-fb9e-4516-8c55-5bfa339ab7e6
📒 Files selected for processing (4)
.snykpkg/daemon/bootupd.gopkg/daemon/bootupd_test.gopkg/daemon/update.go
🚧 Files skipped from review as they are similar to previous changes (2)
- .snyk
- pkg/daemon/update.go
|
/pipeline required |
|
Scheduling tests matching the |
|
Verification. To verify that the new nodes can be created we will use these images In an IPI on GCP secureboot connected cluster without proxyWe were able to create new nodes using the list 4.17, 4.15, 4.14 and 4.13 images. With 4.13 images only support In the machine-config-daemon-firstboot.service we can see the bootloader logic In a disconnecte IPI on GCP secureboot cluster without proxySame results, all nodes were created without problems. |
7ae77f4 to
8065acf
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/daemon/bootupd_test.go (1)
80-98: Add a non-empty “no ESP” boundary case for ESP helpers.Line 81 and Line 90 currently validate empty/non-empty happy paths, but not a non-empty tree with zero ESP partitions. Adding that case would harden regression coverage.
✅ Suggested test additions
func TestCountDisksWithESP(t *testing.T) { if n := countDisksWithESP(singleDiskTree); n != 1 { t.Errorf("single disk: expected 1, got %d", n) } if n := countDisksWithESP(multiDiskTree); n != 2 { t.Errorf("multi disk: expected 2, got %d", n) } + noESPTree := []lsblkDevice{ + {Name: "/dev/vda", Children: []lsblkDevice{ + {Name: "/dev/vda1", PartType: "0fc63daf-8483-4772-8e79-3d69d8477de4", FSType: "ext4"}, + }}, + } + if n := countDisksWithESP(noESPTree); n != 0 { + t.Errorf("no esp: expected 0, got %d", n) + } } func TestFindAnyESP(t *testing.T) { esp := findAnyESP(singleDiskTree) if esp != "/dev/sda2" { t.Errorf("expected /dev/sda2, got %q", esp) } + noESPTree := []lsblkDevice{ + {Name: "/dev/vda", Children: []lsblkDevice{ + {Name: "/dev/vda1", PartType: "0fc63daf-8483-4772-8e79-3d69d8477de4", FSType: "ext4"}, + }}, + } + if esp := findAnyESP(noESPTree); esp != "" { + t.Errorf("expected empty for tree without ESP, got %q", esp) + } + if esp := findAnyESP([]lsblkDevice{}); esp != "" { t.Errorf("expected empty for no devices, got %q", esp) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/daemon/bootupd_test.go` around lines 80 - 98, Add a non-empty tree with zero ESP partitions to both tests: in TestCountDisksWithESP create a fixture (e.g., a slice of lsblkDevice describing disks/partitions none of which have PARTLABEL/TYPE indicating an EFI System Partition) and assert countDisksWithESP(...) returns 0; in TestFindAnyESP add the same non-empty, no-ESP fixture and assert findAnyESP(...) returns an empty string. Reference the existing tests TestCountDisksWithESP and TestFindAnyESP and the helper functions countDisksWithESP and findAnyESP when adding the new assertions so the boundary case of a non-empty tree with no ESP is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/daemon/bootupd_test.go`:
- Around line 80-98: Add a non-empty tree with zero ESP partitions to both
tests: in TestCountDisksWithESP create a fixture (e.g., a slice of lsblkDevice
describing disks/partitions none of which have PARTLABEL/TYPE indicating an EFI
System Partition) and assert countDisksWithESP(...) returns 0; in TestFindAnyESP
add the same non-empty, no-ESP fixture and assert findAnyESP(...) returns an
empty string. Reference the existing tests TestCountDisksWithESP and
TestFindAnyESP and the helper functions countDisksWithESP and findAnyESP when
adding the new assertions so the boundary case of a non-empty tree with no ESP
is covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d0199b17-95b2-4408-b6f1-e430a8514cb9
📒 Files selected for processing (4)
.snykpkg/daemon/bootupd.gopkg/daemon/bootupd_test.gopkg/daemon/update.go
✅ Files skipped from review due to trivial changes (2)
- .snyk
- pkg/daemon/update.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/daemon/bootupd.go
8065acf to
2d1609e
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/daemon/bootupd.go (1)
115-120:⚠️ Potential issue | 🟠 MajorUse a per-run extraction container name.
runBootupdFallback()still createsbootupd-efi-extractwith a fixed name while cleanup is best-effort. If the daemon exits before the defer runs, orpodman rm -ffails, the next fallback will collide at create time andupdateLayeredOS()will only warn and continue, leaving later upgrades without a bootloader update.Suggested change
func (dn *Daemon) runBootupdFallback(imageURL string) error { - containerOut, err := dn.podmanInterface.CreatePodmanContainer(nil, "bootupd-efi-extract", imageURL) + containerName := "bootupd-efi-extract-" + string(uuid.NewUUID()) + containerOut, err := dn.podmanInterface.CreatePodmanContainer(nil, containerName, imageURL) if err != nil { return fmt.Errorf("creating container for EFI extraction: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/daemon/bootupd.go` around lines 115 - 120, runBootupdFallback currently uses a fixed container name "bootupd-efi-extract" which can collide between runs; change it to generate a unique per-run name (e.g., append a UUID or timestamp) before calling dn.podmanInterface.CreatePodmanContainer, pass that generatedName instead of the literal, assign containerID from containerOut as before, and use that same variable in the deferred cleanup call to runCmdSync("podman", "rm", "-f", containerID) so cleanup targets the correct per-run container and avoids create-time collisions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/daemon/bootupd.go`:
- Around line 115-120: runBootupdFallback currently uses a fixed container name
"bootupd-efi-extract" which can collide between runs; change it to generate a
unique per-run name (e.g., append a UUID or timestamp) before calling
dn.podmanInterface.CreatePodmanContainer, pass that generatedName instead of the
literal, assign containerID from containerOut as before, and use that same
variable in the deferred cleanup call to runCmdSync("podman", "rm", "-f",
containerID) so cleanup targets the correct per-run container and avoids
create-time collisions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 84f19423-50e0-471a-9cbf-8a997efd98a5
📒 Files selected for processing (4)
.snykpkg/daemon/bootupd.gopkg/daemon/bootupd_test.gopkg/daemon/update.go
🚧 Files skipped from review as they are similar to previous changes (3)
- .snyk
- pkg/daemon/update.go
- pkg/daemon/bootupd_test.go
|
Added a couple fixes based on Sergio's testing in image mode:
@sergiordlr plans to re-test these scenarios and a few more tomorrow. |
|
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
- What I did
This PR adds support bootloader updates prior to an OS update from the MCO, based on steps described in the official OCP docs. The
bootupctlcommand is run from the incoming container image, similar to the way we use rpm-ostree when the bootimage is too old. On failure, we fallback to attempting a blunt copy based on the update script from openshift/os#1795.Notes:
aarch64/x86_64nodes, it is disabled on others architecutres as secure boot isn't applicable to them.- How to verify it
A quick test:
An upgrade test:
A longer way to test is start a secure boot cluster on one of the affected OCP versions, chain upgrade to 4.22. This should succeed as well. This comment on the bug lists the affected versions.
Summary by CodeRabbit
New Features
Bug Fixes / Reliability
Tests
Chores