Skip to content

[MCO-1997] Add test for osImageStream#5881

Open
ptalgulk01 wants to merge 1 commit intoopenshift:mainfrom
ptalgulk01:add-osimagestream-tests
Open

[MCO-1997] Add test for osImageStream#5881
ptalgulk01 wants to merge 1 commit intoopenshift:mainfrom
ptalgulk01:add-osimagestream-tests

Conversation

@ptalgulk01
Copy link
Copy Markdown
Contributor

@ptalgulk01 ptalgulk01 commented Apr 24, 2026

Test Cases Added

OCP-88122: Validate osImageStream inheritance for custom MCPs
Validates dynamic inheritance behavior in custom MachineConfigPools:

  • Case 1: Custom MCPs inherit worker pool's default osImageStream (rhel-9)
  • Case 2: Dynamic inheritance - infra MCP created before worker osImageStream change inherits the new value (rhel-10)
  • Case 3: Custom MCPs can override with explicit osImageStream values

OCP-88708: Verify osstream logs for boot image controller
Validates boot image controller logging for unsupported osImageStream values:

  • Verifies controller logs appropriate warning messages
  • Changes to supported stream and verifies logs stop
  • Tests CPMS integration with managed boot images

OCP-88203: Verify MOSB triggered when osImageStream is patched
Validates MachineOSBuild resource creation when osImageStream changes:

  • Patches custom MCP osImageStream to rhel-10
  • Verifies MOSB resource is created and completes successfully
  • Validates node uses correct osImage from new stream

OCP-88365: Verify MCP degraded when osstream and osImageURL MC applied
Validates error handling for conflicting osstream and osImageURL configurations:

  • Applies MachineConfig with both osstream and osImageURL fields
  • Verifies MCP enters degraded state with appropriate error message
  • Tests recovery after removing conflicting configuration

Implementation Details

  • Uses custom-machine-config-pool.yaml template for MCPs without osImageStream field (dynamic inheritance)
  • Adds AddNodeToPool() helper method to simplify node addition to custom pools
  • Uses NewController().GetLogs() for boot image controller log verification
  • Validates osImage on nodes matches expected stream values

Summary by CodeRabbit

  • Tests
    • Added test infrastructure for node pool configuration operations with synchronization validation.
    • Added test cases covering OS image stream behavior across pool configurations, including inheritance, updates, conflict detection, and logging scenarios.

Add OCP-88122 test case to validate that custom MachineConfigPools
dynamically inherit osImageStream from worker pool when not explicitly
configured.

Test validates three scenarios:
1. Custom MCPs inherit worker pool's default osImageStream (rhel-9)
2. Custom MCP created before worker pool osImageStream change inherits
   the new value (rhel-10) dynamically
3. Custom MCPs can override with explicit osImageStream values

Also adds AddNodeToPool helper method to simplify adding nodes to
custom pools during testing.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Walkthrough

Adds a test helper method for moving nodes into machine config pools and introduces four new disruptive test cases validating osImageStream behavior across MachineConfigPools, MachineOSBuild triggering, conflict detection, and boot-image-controller logging.

Changes

Cohort / File(s) Summary
Node Pool Helper Method
test/extended-priv/machineconfigpool.go
Adds AddNodeToPool() helper that applies the pool label to a node and waits for MCP status synchronization (machine count increment and Updated condition).
osImageStream Test Cases
test/extended-priv/mco_osimagestream.go
Adds four new Ginkgo test cases: inheritance validation across custom MCPs (88122), MachineOSBuild triggering on osImageStream patch (88203), degradation handling for osImageURL/osImageStream conflicts (88365), and boot-image-controller logging for unsupported streams with CPMS integration (88708).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Pull request contains four test quality issues: AddNodeToPool race condition, missing MCP recovery verification, incorrect Eventually vs Consistently usage, and unfiltered MachineSet selection. Fix AddNodeToPool label check, add explicit recovery assertions in OCP-88365, replace Eventually().ShouldNot() with Consistently().ShouldNot() in OCP-88708, and filter MachineSet by worker role label.
Microshift Test Compatibility ⚠️ Warning The PR adds four new e2e tests (OCP-88122, OCP-88203, OCP-88365, OCP-88708) that use machine.openshift.io APIs (MachineConfigPool, MachineOSBuild, MachineSet) which are unavailable on MicroShift. Add [apigroup:machine.openshift.io] tags to test names or [Skipped:MicroShift] labels to protect these tests from running in MicroShift CI environments.
Single Node Openshift (Sno) Test Compatibility ⚠️ Warning Tests OCP-88122, OCP-88203, and OCP-88708 use AddNodeToPool() and GetWorkerNodes() to move worker nodes between MachineConfigPools, which would break SNO clusters by moving the only node. No SNO protection mechanisms ([Skipped:SingleReplicaTopology] labels or IsSingleNode() checks) are present. Add [Skipped:SingleReplicaTopology] labels to multi-node tests or add exutil.IsSingleNode() checks with g.Skip() guards. Run tests against SNO CI job: periodic-ci-openshift-release-master-ci-4.22-e2e-aws-upgrade-ovn-single-node.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[MCO-1997] Add test for osImageStream' partially relates to the changeset. It captures one test file's addition (osImageStream tests) but omits the significant helper method added to MachineConfigPool, making it incomplete for the full scope of changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are stable and deterministic with no dynamic information like pod names, timestamps, UUIDs, or IP addresses in test titles.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds topology-aware test code and manifests for MachineConfigPool resources without introducing pod scheduling constraints that would break SNO, Two-Node, or HyperShift topologies.
Ote Binary Stdout Contract ✅ Passed Both test helper files contain no process-level functions or process-level stdout writes that would violate the OTE Binary Stdout Contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Test files contain no IPv4 hardcoded addresses, no IPv4-specific network operations, no external registry references, and no external connectivity requirements at test execution time.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ptalgulk01
Once this PR has been reviewed and has the lgtm label, please assign dkhater-redhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
test/extended-priv/mco_osimagestream.go (2)

317-319: Redundant waitForComplete after AddNodeToPool.

AddNodeToPool already calls WaitImmediateForUpdatedStatus internally, so this explicit waitForComplete adds only additional wait time without additional verification. If the extra wait is intentional to guard against the TDZ-style race flagged in AddNodeToPool, prefer hardening the helper instead of layering waits at each call site.

♻️ Proposed trim
 		exutil.By("Add a node to infra pool")
 		err = infraMcp.AddNodeToPool(mcp.GetSortedNodesOrFail()[0])
 		o.Expect(err).NotTo(o.HaveOccurred(), "Error adding node to infra pool")
 		logger.Infof("OK!\n")
 
-		exutil.By("Wait for infra pool to complete update with the new node")
-		infraMcp.waitForComplete()
-		logger.Infof("OK!\n")
-
 		exutil.By("Verify node in infra pool inherited rhel-10 from worker pool")
 		validateOsImageStreamInPool(infraMcp, osis, OSImageStreamRHEL10)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended-priv/mco_osimagestream.go` around lines 317 - 319, The call
site uses infraMcp.waitForComplete() immediately after AddNodeToPool, which is
redundant because AddNodeToPool already invokes WaitImmediateForUpdatedStatus;
remove the extra infraMcp.waitForComplete() call (or the line invoking
logger.Infof("OK!\n") should remain but without the redundant wait) and instead,
if guarding against the TDZ-style race is required, harden
AddNodeToPool/WaitImmediateForUpdatedStatus itself (update AddNodeToPool to
perform any additional verification or retry logic around
WaitImmediateForUpdatedStatus) so callers like this test (using infraMcp and
AddNodeToPool) do not need to layer waits.

469-481: Tighten test robustness against controller log message changes.

The test hardcodes "machineset tc-%s-dup has unsupported stream" at line 475, while the actual controller log is "machineset %s has unsupported stream: %v, skipping boot image update" (per pkg/controller/bootimage/ms_helpers.go:126). Using ContainSubstring at lines 505/518 masks this coupling—the test currently passes because its substring is contained in the actual log, but if the controller reformats the message (capitalization, phrasing, additional fields), the test becomes brittle and may silently break.

For consistency with line 541, which already uses the looser "has unsupported stream" substring, consider updating lines 475/505/518 to match on a stable substring anchor (e.g., duplicateMsName + " has unsupported stream" or just "has unsupported stream") rather than the full hardcoded message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended-priv/mco_osimagestream.go` around lines 469 - 481, Replace the
overly specific expected log string in the test and assertions with a stable
substring anchor: update the variable unsupportedStreamLogMsg (and any
assertions referencing it) to use a looser match such as duplicateMsName + " has
unsupported stream" or simply "has unsupported stream", and ensure the
ContainSubstring checks against controller log output (where NewController,
duplicateMsName and unsupportedStreamLogMsg are used) use that substring to
avoid brittle exact-format coupling with ms_helpers.go log text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended-priv/machineconfigpool.go`:
- Around line 1552-1577: In AddNodeToPool, avoid the race where a pre-labeled
node inflates currentNodes and causes WaitForMachineCount to hang: first call
node.GetLabel("node-role.kubernetes.io/<pool>") (use mcp.GetName() to build the
key) and only call node.AddLabel(...) if the label is absent; after successfully
adding the label (or if you detect a fresh addition), wait for the MCP to see
the new machine with mcp.WaitForMachineCount(expectedCount, ...), then instead
of immediately calling WaitImmediateForUpdatedStatus, poll the MCP for
Updating==True (or assert updatedMachineCount == machineCount) as a handshake
before asserting Updated==True (so replace or precede the
WaitImmediateForUpdatedStatus call with a short WaitForUpdating/condition check
to ensure reconciliation started).

In `@test/extended-priv/mco_osimagestream.go`:
- Around line 456-466: After asserting the MCP is degraded, remove the
conflicting MachineConfig (call the same removal used in cleanup, e.g. invoke
mc.DeleteWithWait() or DeleteCustomMCP()) and then explicitly assert recovery by
polling the customMcp: use o.Eventually(customMcp, "5m",
"20s").Should(HaveConditionField("RenderDegraded", "status", FalseString)) and
also check Updated condition true (e.g. HaveConditionField("Updated", "status",
TrueString)) to ensure the MCP transitions back to healthy; place these
assertions after deleting the MC and before relying on deferred cleanup to catch
regressions.
- Around line 517-521: The test currently uses Eventually(controller.GetLogs,
"2m", "10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg)) which can
pass immediately because controller.IgnoreLogsBeforeNowOrFail() clears logs;
change this to Consistently(controller.GetLogs, "2m",
"10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg)) so the assertion
verifies the unsupportedStreamLogMsg never appears for the entire duration;
update the call site where controller.GetLogs and unsupportedStreamLogMsg are
referenced and ensure the same polling interval arguments are used with
Consistently instead of Eventually.
- Around line 483-492: The code selects the first MachineSet from existingMsList
and doesn't handle delete errors; update the selection to filter only worker
MachineSets by calling
ByLabel("machine.openshift.io/cluster-api-machine-role=worker") on the
NewMachineSetList(oc.AsAdmin(), MachineAPINamespace) result before using
existingMsList[0], and change the deferred cleanup to call duplicateMs.Delete()
and log any returned error (e.g., check the error returned by
duplicateMs.Delete() and logger.Errorf with duplicateMsName and the error)
instead of discarding it.

---

Nitpick comments:
In `@test/extended-priv/mco_osimagestream.go`:
- Around line 317-319: The call site uses infraMcp.waitForComplete() immediately
after AddNodeToPool, which is redundant because AddNodeToPool already invokes
WaitImmediateForUpdatedStatus; remove the extra infraMcp.waitForComplete() call
(or the line invoking logger.Infof("OK!\n") should remain but without the
redundant wait) and instead, if guarding against the TDZ-style race is required,
harden AddNodeToPool/WaitImmediateForUpdatedStatus itself (update AddNodeToPool
to perform any additional verification or retry logic around
WaitImmediateForUpdatedStatus) so callers like this test (using infraMcp and
AddNodeToPool) do not need to layer waits.
- Around line 469-481: Replace the overly specific expected log string in the
test and assertions with a stable substring anchor: update the variable
unsupportedStreamLogMsg (and any assertions referencing it) to use a looser
match such as duplicateMsName + " has unsupported stream" or simply "has
unsupported stream", and ensure the ContainSubstring checks against controller
log output (where NewController, duplicateMsName and unsupportedStreamLogMsg are
used) use that substring to avoid brittle exact-format coupling with
ms_helpers.go log text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d3499c12-6325-4124-af9e-d28cce7052b6

📥 Commits

Reviewing files that changed from the base of the PR and between 5f1b7ad and 043492b.

📒 Files selected for processing (2)
  • test/extended-priv/machineconfigpool.go
  • test/extended-priv/mco_osimagestream.go

Comment on lines +1552 to +1577
// AddNodeToPool adds a node to the MCP and waits for the MCP to recognize it
func (mcp *MachineConfigPool) AddNodeToPool(node *Node) error {
currentNodes, err := mcp.GetNodes()
if err != nil {
return fmt.Errorf("error getting current nodes from %s: %w", mcp.GetName(), err)
}
expectedCount := len(currentNodes) + 1

err = node.AddLabel(fmt.Sprintf("node-role.kubernetes.io/%s", mcp.GetName()), "")
if err != nil {
return fmt.Errorf("error labeling node %s to add it to pool %s: %w", node.GetName(), mcp.GetName(), err)
}
logger.Infof("Node %s added to pool %s", node.GetName(), mcp.GetName())

err = mcp.WaitForMachineCount(expectedCount, 5*time.Minute)
if err != nil {
return fmt.Errorf("the %s MCP is not reporting the expected machine count: %w", mcp.GetName(), err)
}

err = mcp.WaitImmediateForUpdatedStatus()
if err != nil {
return fmt.Errorf("the %s MCP is not updated: %w", mcp.GetName(), err)
}

return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
ast-grep --pattern $'func (n *Node) HasLabel($_, $_) $_'
rg -nP --type=go -C2 '\bfunc \(.*Node\) (Has|Get)Label\b'

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Find the Node type definition and its methods
fd -e go test/extended-priv/
head -100 test/extended-priv/machineconfigpool.go

# Search for Node type definition
rg -A 20 "^type Node struct" test/extended-priv/

# Search for any label-related methods on Node
rg "func.*Node.*Label" test/extended-priv/

# Check what methods Node actually has
rg "func \(.*\*?Node\)" test/extended-priv/ | head -30

Repository: openshift/machine-config-operator

Length of output: 7984


🏁 Script executed:

# Also search for where AddLabel is implemented/used elsewhere
rg -B 5 -A 10 "AddLabel" test/extended-priv/ | head -80

Repository: openshift/machine-config-operator

Length of output: 5884


🏁 Script executed:

# Check how WaitImmediateForUpdatedStatus is implemented
rg -A 15 "func.*WaitImmediateForUpdatedStatus" test/extended-priv/

# Check WaitForMachineCount implementation  
rg -A 15 "func.*WaitForMachineCount" test/extended-priv/

# Check GetNodes to understand what it returns
rg -A 10 "func.*GetNodes\b" test/extended-priv/

Repository: openshift/machine-config-operator

Length of output: 5386


Consider the race condition with immediate polling and pre-existing node membership.

Two minor concerns with this helper:

  1. Pre-existing membership timeout: If the node already has the node-role.kubernetes.io/<pool> label, it will already be included in currentNodes and the expectation of len(currentNodes) + 1 machines will never be satisfied, causing WaitForMachineCount to time out. A check using GetLabel() before labeling would avoid this unnecessary wait.

  2. Race with WaitImmediateForUpdatedStatus: The immediate polling (immediate=true) means WaitImmediateForUpdatedStatus can observe the stale Updated=True status from a prior quiesced state, returning success before the MCP controller has actually begun reconciling the newly-added node. While WaitForMachineCount helps bridge this, consider explicitly waiting for Updating=True first as a handshake before checking for Updated=True, or use updatedMachineCount == machineCount as a synchronization signal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended-priv/machineconfigpool.go` around lines 1552 - 1577, In
AddNodeToPool, avoid the race where a pre-labeled node inflates currentNodes and
causes WaitForMachineCount to hang: first call
node.GetLabel("node-role.kubernetes.io/<pool>") (use mcp.GetName() to build the
key) and only call node.AddLabel(...) if the label is absent; after successfully
adding the label (or if you detect a fresh addition), wait for the MCP to see
the new machine with mcp.WaitForMachineCount(expectedCount, ...), then instead
of immediately calling WaitImmediateForUpdatedStatus, poll the MCP for
Updating==True (or assert updatedMachineCount == machineCount) as a handshake
before asserting Updated==True (so replace or precede the
WaitImmediateForUpdatedStatus call with a short WaitForUpdating/condition check
to ensure reconciliation started).

Comment on lines +456 to +466
exutil.By("Verify MCP is degraded with error about osImageURL and osImageStream conflict")
o.Eventually(customMcp, "5m", "20s").Should(HaveConditionField("RenderDegraded", "status", TrueString),
"MCP should be degraded when both osImageURL and osImageStream are set")

expectedErrorMsg := "cannot override MachineConfig osImageURL and set MachineConfigPool spec.osImageStream.name simultaneously"
o.Eventually(customMcp.Get, "2m", "10s").WithArguments(`{.status.conditions[?(@.type=="RenderDegraded")].message}`).Should(
o.ContainSubstring(expectedErrorMsg),
"MCP degraded message should mention osImageURL and osImageStream conflict")
logger.Infof("MCP correctly degraded with expected error message")
logger.Infof("OK!\n")
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing explicit recovery assertion after removing the conflicting MC.

The PR summary for OCP-88365 states this test "tests recovery after removing the conflict", but the body only asserts the degraded state and relies on the deferred mc.DeleteWithWait() / DeleteCustomMCP() for cleanup — it doesn't Eventually assert that the MCP transitions back to RenderDegraded=False and Updated=True after the conflicting MC is removed. Without that assertion, a regression where the pool stays wedged in degraded would be silently masked (cleanup would likely time out or log-only error in defer).

Consider adding an explicit "remove the MC and verify recovery" step before the defers kick in.

🛡️ Suggested additional step
 		logger.Infof("MCP correctly degraded with expected error message")
 		logger.Infof("OK!\n")
+
+		exutil.By("Remove the conflicting MachineConfig and verify MCP recovers")
+		mc.DeleteWithWait()
+		o.Eventually(customMcp, "10m", "20s").Should(HaveConditionField("RenderDegraded", "status", FalseString),
+			"MCP should recover from RenderDegraded after removing the conflicting MC")
+		o.Expect(customMcp.WaitForUpdatedStatus()).To(o.Succeed(), "MCP should become Updated after recovery")
+		logger.Infof("OK!\n")
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended-priv/mco_osimagestream.go` around lines 456 - 466, After
asserting the MCP is degraded, remove the conflicting MachineConfig (call the
same removal used in cleanup, e.g. invoke mc.DeleteWithWait() or
DeleteCustomMCP()) and then explicitly assert recovery by polling the customMcp:
use o.Eventually(customMcp, "5m",
"20s").Should(HaveConditionField("RenderDegraded", "status", FalseString)) and
also check Updated condition true (e.g. HaveConditionField("Updated", "status",
TrueString)) to ensure the MCP transitions back to healthy; place these
assertions after deleting the MC and before relying on deferred cleanup to catch
regressions.

Comment on lines +483 to +492
exutil.By(fmt.Sprintf("Create duplicate MachineSet %s from existing worker MachineSet", duplicateMsName))
existingMsList := NewMachineSetList(oc.AsAdmin(), MachineAPINamespace).GetAllOrFail()
o.Expect(len(existingMsList)).To(o.BeNumerically(">", 0), "At least one MachineSet should exist")
duplicateMs, err = existingMsList[0].Duplicate(duplicateMsName)
o.Expect(err).NotTo(o.HaveOccurred(), "Error duplicating MachineSet")
defer func() {
logger.Infof("Cleaning up duplicate MachineSet %s", duplicateMsName)
duplicateMs.Delete()
}()
logger.Infof("OK!\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find how MachineSet role is inspected elsewhere
rg -nP --type=go -C2 '(cluster-api-machine-role|machine\.openshift\.io/cluster-api-machine-role)'
ast-grep --pattern $'func ($_ *MachineSet) $_() $$$'

Repository: openshift/machine-config-operator

Length of output: 6496


🏁 Script executed:

#!/bin/bash
# Search for MachineSet.Get() usages to understand field path format
rg -nP --type=go 'ms\.Get\(' -A2 -B2
# Search for worker role references
rg -nP --type=go '(worker|cluster-api-machine-role)' --glob='*test*' | head -30

Repository: openshift/machine-config-operator

Length of output: 11298


🏁 Script executed:

#!/bin/bash
# Find MachineSet.Duplicate() and Delete() methods
rg -nP --type=go 'func \(.*MachineSet.*\) (Duplicate|Delete)' -A3
# Search for existing worker filter patterns or helper functions
rg -nP --type=go 'NewMachineSetList|GetAllOrFail|ByLabel' -A2 -B2 | head -50

Repository: openshift/machine-config-operator

Length of output: 3954


🏁 Script executed:

#!/bin/bash
# Find the Delete() method for MachineSet
rg -nP --type=go 'func \(.*MachineSet.*\) Delete' -A3
# Search for ByLabel method on MachineSetList
rg -nP --type=go 'func.*MachineSetList.*ByLabel' -A3
# Check CloneResource to understand Duplicate's implementation
rg -nP --type=go 'func CloneResource' -A10

Repository: openshift/machine-config-operator

Length of output: 852


🏁 Script executed:

#!/bin/bash
# Find Delete method more broadly
rg -nP --type=go 'Delete.*\(\)' test/extended-priv/machineset.go -B2 -A5
# Also search in resource.go for Delete method
rg -nP --type=go 'func \(.*\) Delete' test/extended-priv/resource.go -B2 -A5
# Look for MachineSetList ByLabel
rg -nP --type=go '\.ByLabel' test/extended-priv/machineset.go -B2 -A2
# Check if Delete is called elsewhere in the codebase
rg -nP --type=go 'duplicateMs\.Delete\(\)|\.Delete\(\)' test/extended-priv/ -B1 -A1

Repository: openshift/machine-config-operator

Length of output: 16087


🏁 Script executed:

#!/bin/bash
# Find ByLabel implementation on ResourceList or MachineSetList
rg -nP --type=go 'func.*ByLabel' test/extended-priv/ -A3
# Check what labels are used to identify worker MachineSets
rg -nP --type=go 'worker.*MachineSet|MachineSet.*worker' test/extended-priv/ -B2 -A2
# Look for role-based filtering in MachineSet selection
rg -nP --type=go 'cluster-api-machine-role' test/extended-priv/ -B2 -A2

Repository: openshift/machine-config-operator

Length of output: 2623


🏁 Script executed:

#!/bin/bash
# Check if there are other instances of filtering worker MachineSets
rg -nP --type=go 'MachineSetList.*GetAll|NewMachineSetList' test/extended-priv/ -A5 | head -40
# Look for worker pool constants
rg -nP --type=go 'MachineConfigPoolWorker|MachineAPINamespace' test/extended-priv/const.go -B2 -A2

Repository: openshift/machine-config-operator

Length of output: 3399


Filter MachineSet selection to worker role and handle the cleanup deletion error.

Two issues in this block:

  1. existingMsList[0] picks the first MachineSet in API order, not necessarily a worker. Clusters with infra or other MachineSets will cause subsequent assertions tied to osstream labels to fail or produce misleading results. The comment itself says "from existing worker MachineSet"—filter explicitly using:

    existingMsList.ByLabel("machine.openshift.io/cluster-api-machine-role=worker")
  2. The deferred duplicateMs.Delete() silently discards errors. If deletion fails, the duplicate MachineSet and its provisioned machines leak into subsequent tests. Log the error:

    if derr := duplicateMs.Delete(); derr != nil {
        logger.Errorf("Failed to delete duplicate MachineSet %s: %v", duplicateMsName, derr)
    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended-priv/mco_osimagestream.go` around lines 483 - 492, The code
selects the first MachineSet from existingMsList and doesn't handle delete
errors; update the selection to filter only worker MachineSets by calling
ByLabel("machine.openshift.io/cluster-api-machine-role=worker") on the
NewMachineSetList(oc.AsAdmin(), MachineAPINamespace) result before using
existingMsList[0], and change the deferred cleanup to call duplicateMs.Delete()
and log any returned error (e.g., check the error returned by
duplicateMs.Delete() and logger.Errorf with duplicateMsName and the error)
instead of discarding it.

Comment on lines +517 to +521
exutil.By("Verify controller no longer logs unsupported stream message for this MachineSet")
o.Eventually(controller.GetLogs, "2m", "10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg),
"Controller should not log unsupported stream message after changing to supported stream")
logger.Infof("Controller correctly stopped logging unsupported stream for %s", duplicateMsName)
logger.Infof("OK!\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Gomega Eventually ShouldNot vs Consistently ShouldNot semantics for asserting a condition never becomes true

💡 Result:

Gomega's Eventually.ShouldNot and Consistently.ShouldNot have distinct semantics for asserting a condition never becomes true. Eventually(func Condition).ShouldNot(Matcher) polls until the matcher FAILS (i.e., Condition does NOT match Matcher) or timeout. It succeeds if the matcher never succeeds within the timeout period. However, if the condition briefly matches the matcher at any poll, it succeeds immediately. This is rarely useful for "never becomes true" because any transient match causes success; it's more for expecting the negation to eventually hold stably. Consistently(func Condition).ShouldNot(Matcher) polls for the full duration, succeeding only if the matcher FAILS at EVERY poll (i.e., condition never matches). It fails immediately if the matcher ever succeeds during polling. This is the idiomatic way to assert a condition "never becomes true" over a time window. Example from official docs for Consistently.ShouldNot: Consistently(channel, "200ms").ShouldNot(Receive) // asserts nothing received for 200ms For Eventually.ShouldNot, no direct "never" example in docs; its use is niche since it succeeds on first non-match but requires non-match to persist until timeout without ever matching. To assert "never becomes true" indefinitely, use longer durations with Consistently or combine with other patterns like separate Eventually(finished) then Expect(successful).

Citations:


🏁 Script executed:

cd test/extended-priv && wc -l mco_osimagestream.go

Repository: openshift/machine-config-operator

Length of output: 102


🏁 Script executed:

cd test/extended-priv && sed -n '515,545p' mco_osimagestream.go

Repository: openshift/machine-config-operator

Length of output: 1659


Replace Eventually(...).ShouldNot(...) with Consistently(...).ShouldNot(...) for log-absence verification.

After controller.IgnoreLogsBeforeNowOrFail() clears logs, GetLogs() returns an empty buffer that trivially does not contain the target substring. Eventually(...).ShouldNot(...) succeeds immediately on the first poll and never verifies the log stays absent over the full 2-minute window. A regression where the controller emits the unsupported-stream log after switching to a supported stream would not be caught.

Consistently(...).ShouldNot(...) enforces the matcher fails at every poll throughout the entire duration, which is the correct assertion for "this message must never appear."

Proposed fix
 		exutil.By("Verify controller no longer logs unsupported stream message for this MachineSet")
-		o.Eventually(controller.GetLogs, "2m", "10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg),
+		o.Consistently(controller.GetLogs, "2m", "10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg),
 			"Controller should not log unsupported stream message after changing to supported stream")
 		exutil.By("Verify boot image controller does not log unsupported stream for CPMS")
 		controller.IgnoreLogsBeforeNowOrFail()
-		o.Eventually(controller.GetLogs, "2m", "10s").ShouldNot(o.ContainSubstring("has unsupported stream"),
+		o.Consistently(controller.GetLogs, "2m", "10s").ShouldNot(o.ContainSubstring("has unsupported stream"),
 			"Boot image controller should not log unsupported stream message for CPMS")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
exutil.By("Verify controller no longer logs unsupported stream message for this MachineSet")
o.Eventually(controller.GetLogs, "2m", "10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg),
"Controller should not log unsupported stream message after changing to supported stream")
logger.Infof("Controller correctly stopped logging unsupported stream for %s", duplicateMsName)
logger.Infof("OK!\n")
exutil.By("Verify controller no longer logs unsupported stream message for this MachineSet")
o.Consistently(controller.GetLogs, "2m", "10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg),
"Controller should not log unsupported stream message after changing to supported stream")
logger.Infof("Controller correctly stopped logging unsupported stream for %s", duplicateMsName)
logger.Infof("OK!\n")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended-priv/mco_osimagestream.go` around lines 517 - 521, The test
currently uses Eventually(controller.GetLogs, "2m",
"10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg)) which can pass
immediately because controller.IgnoreLogsBeforeNowOrFail() clears logs; change
this to Consistently(controller.GetLogs, "2m",
"10s").ShouldNot(o.ContainSubstring(unsupportedStreamLogMsg)) so the assertion
verifies the unsupportedStreamLogMsg never appears for the entire duration;
update the call site where controller.GetLogs and unsupportedStreamLogMsg are
referenced and ensure the same polling interval arguments are used with
Consistently instead of Eventually.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 24, 2026

@ptalgulk01: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 043492b link true /test unit

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

}

// AddNodeToPool adds a node to the MCP and waits for the MCP to recognize it
func (mcp *MachineConfigPool) AddNodeToPool(node *Node) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add nodes to a master pool, for example. So we can't define this function as a general method in a MCP.

It would be better if we write it as an independent function accepting a MCP as parameter, and not as a method in the MCP struct.

This code is very similar to the one in CreateCustomMCPWithStreamByNodes

We can make the function to add several nodes, not only one, to the poo and reuse it in CreateCustomMCPWithStreamByNodes avoiding duplicated code.

Comment on lines +240 to +241
g.Skip("The cluster is SNO/Compact. This test cannot be executed in SNO/Compact clusters")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the SkipIfCompactOrSNO function that is used in the private repo. It seems it hasn't been migrated yet, but it should be migrated anyway.

// SkipIfCompactOrSNO skips the test case if the cluster is a compact or SNO cluster
func SkipIfCompactOrSNO(oc *exutil.CLI) {
        if IsCompactOrSNOCluster(oc) {
                g.Skip("The test is not supported in Compact or SNO clusters")
        } 
}

Comment on lines +348 to +350
if IsCompactOrSNOCluster(oc.AsAdmin()) {
g.Skip("The cluster is SNO/Compact. This test cannot be executed in SNO/Compact clusters")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, let's migrate and use the SkipIfCompactOrSNO function

Comment on lines +393 to +399
o.Eventually(func() string {
currentMOSB, err := mosc.GetCurrentMachineOSBuild()
if err != nil {
return ""
}
return currentMOSB.GetName()
}, "5m", "20s").ShouldNot(o.Equal(initialMOSBName), "A new MachineOSBuild should be triggered after osImageStream change")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is cleaner if we don't swallow the error silently

  o.Eventually(func() (string, error) {                                                                                                                                                                            
      mosb, err := mosc.GetCurrentMachineOSBuild()                                                                                                                                                                 
      if err != nil {                                                                                                                                                                                              
          return "", err
      }                                                                                                                                                                                                            
      return mosb.GetName(), nil
  }, "5m", "20s").ShouldNot(o.Equal(initialMOSBName),
      "A new MachineOSBuild should be triggered after osImageStream change")

Comment on lines +408 to +410
if IsCompactOrSNOCluster(oc.AsAdmin()) {
g.Skip("The cluster is SNO/Compact. This test cannot be executed in SNO/Compact clusters")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same let's use the SkipIfCompactOrSNO function

logger.Infof("OK!\n")

exutil.By(fmt.Sprintf("Set osstream label to %s on MachineSet", OSImageStreamRHEL10))
err = duplicateMs.Patch("merge", fmt.Sprintf(`{"metadata":{"labels":{"machineconfiguration.openshift.io/osstream":"%s"}}}`, OSImageStreamRHEL10))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a method for this, so that it can be reused in future tests.

We can use AdLabel instead. But I think it would be better to use even a more specific method so that we don't have to care about the name of the label.

duplicateMs.AddOSStreamLabel(...)

Then this function can call "AddLabel".

// Reset log checkpoint before changing to supported stream to verify the new behavior
exutil.By("Change osstream label to rhel-9 and verify no unsupported stream log is triggered")
controller.IgnoreLogsBeforeNowOrFail()
err = duplicateMs.Patch("merge", fmt.Sprintf(`{"metadata":{"labels":{"machineconfiguration.openshift.io/osstream":"%s"}}}`, OSImageStreamRHEL9))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment on lines +523 to +534
exutil.By("Enable CPMS for managed boot images")
cpms = NewControlPlaneMachineSet(oc.AsAdmin(), MachineAPINamespace, ControlPlaneMachineSetName)
o.Expect(cpms.Exists()).To(o.BeTrue(), "ControlPlaneMachineSet should exist for this test")

machineConfig = GetMachineConfiguration(oc.AsAdmin())
defer machineConfig.RemoveManagedBootImagesConfig()

o.Expect(
machineConfig.SetAllManagedBootImagesConfig(ControlPlaneMachineSetResource),
).To(o.Succeed(), "Error enabling CPMS for managed boot images")
logger.Infof("OK!\n")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPMS and Machinesets are two different functionalities.

We should use a different test case for the CPMS

Comment on lines +300 to +302
err = NewMCOTemplate(oc.AsAdmin(), "custom-machine-config-pool.yaml").Create(
"-p", fmt.Sprintf("NAME=%s", infraName),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call CreateCustomMCP


exutil.By("Create custom MCP with rhel-9 configured")
defer DeleteCustomMCP(oc.AsAdmin(), mcp1Name)
mcp1, err := CreateCustomMCP(oc.AsAdmin(), mcp1Name, 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateCustomMCPWithStream(oc.AsAdmin(), mcp2Name, OSImageStreamRHEL9, 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants