Skip to content

OCPBUGS-81761: Cloud ClusterHostedDNS Load Balancer IPs need to returned in a predictable order#5870

Open
sadasu wants to merge 2 commits intoopenshift:mainfrom
sadasu:cloud-custom-dns
Open

OCPBUGS-81761: Cloud ClusterHostedDNS Load Balancer IPs need to returned in a predictable order#5870
sadasu wants to merge 2 commits intoopenshift:mainfrom
sadasu:cloud-custom-dns

Conversation

@sadasu
Copy link
Copy Markdown
Contributor

@sadasu sadasu commented Apr 21, 2026

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-81761

With the current behavior, when there are multiple Cloud LB IPs for API, APIInt or Ingress, the IPs are returned in random order. This caused the CoreDNS pod template to be regenerated every time, causing machine config controller churn.
Added a fix to make sure the LB IPs are sorted before being used to generate the CoreDNS pod from its template. Also added an unit test to verify this behavior.

Summary by CodeRabbit

  • Bug Fixes

    • Load balancer IP lists from cloud platform integrations (GCP, AWS, Azure) are now consistently and deterministically sorted, improving stability and predictability.
  • Tests

    • Added unit tests to verify load balancer IP ordering is consistent and correctly sorted across supported cloud platforms.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-81761, which is invalid:

  • expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-81761

With the current behavior, when there are multiple Cloud LB IPs for API, APIInt or Ingress, the IPs are returned in random order. This caused the CoreDNS pod template to be regenerated every time, causing machine config controller churn.
Added a fix to make sure the LB IPs are sorted before being used to generate the CoreDNS pod from its template. Also added an unit test to verify this behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 26be2665-ef77-4a03-b526-2c8b067fe79a

📥 Commits

Reviewing files that changed from the base of the PR and between 33aeb6e and fb3a965.

📒 Files selected for processing (1)
  • pkg/controller/template/render.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/controller/template/render.go

Walkthrough

Added deterministic sorting of load-balancer IP slices by cloning and ordering IP lists before return; introduced a unit test that verifies stable ordering across GCP, AWS, and Azure using varied input permutations.

Changes

Cohort / File(s) Summary
Test Coverage
pkg/controller/template/cloudplatform_lb_test.go
New test TestCloudPlatformLoadBalancerIPsOrdering that builds platform-specific RenderConfig permutations and asserts the three IP-retrieval helpers return a deterministically sorted []configv1.IP.
IP Retrieval Functions
pkg/controller/template/render.go
Added unexported cloneAndSortIPs using net/netip; updated cloudPlatformAPIIntLoadBalancerIPs, cloudPlatformAPILoadBalancerIPs, and cloudPlatformIngressLoadBalancerIPs to handle errors from cloudPlatformLoadBalancerIPs, clone and sort the returned []configv1.IP, and return the sorted slice.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: ensuring cloud load balancer IPs are returned in deterministic order, which aligns with the changeset that adds IP sorting logic and tests.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Pull request introduces standard Go tests with stable, deterministic names derived from compile-time constants, not Ginkgo tests.
Test Structure And Quality ✅ Passed This PR introduces standard Go unit tests using testing.T, not Ginkgo tests. The custom check is designed for Ginkgo test code and is not applicable to this codebase context.
Microshift Test Compatibility ✅ Passed This PR adds only a standard Go unit test using the testing package with no Ginkgo imports or patterns. The custom check applies specifically to new Ginkgo e2e tests, which are not present.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The pull request adds a standard Go unit test, not a Ginkgo e2e test, so SNO compatibility check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed This pull request does not introduce topology-aware scheduling constraints. Changes are limited to deterministic sorting of cloud load balancer IP addresses in helper functions for DNS configuration.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes detected. Changes add unit test and sorting helper functions without modifying main(), init(), or Ginkgo suite-level code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR does not add Ginkgo e2e tests. It adds a standard Go unit test using testing.T package that validates IP address sorting without IPv4 assumptions or external connectivity requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 21, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-81761, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/controller/template/render.go (1)

823-829: Consider deduplicating the clone+sort logic into one helper.

Lines 823-853 repeat the same transformation three times; a shared helper would reduce drift risk.

♻️ Suggested refactor
+func cloudPlatformSortedLoadBalancerIPs(cfg RenderConfig, lbType LoadBalancerType) (interface{}, error) {
+	ips, err := cloudPlatformLoadBalancerIPs(cfg, lbType)
+	if err != nil {
+		return nil, err
+	}
+	ipsClone := slices.Clone(ips.([]configv1.IP))
+	slices.Sort(ipsClone)
+	return ipsClone, nil
+}
+
 func cloudPlatformAPIIntLoadBalancerIPs(cfg RenderConfig) (interface{}, error) {
-	ips, err := cloudPlatformLoadBalancerIPs(cfg, apiIntLB)
-	if err != nil {
-		return nil, err
-	}
-	ipsClone := slices.Clone(ips.([]configv1.IP))
-	slices.Sort(ipsClone)
-	return ipsClone, nil
+	return cloudPlatformSortedLoadBalancerIPs(cfg, apiIntLB)
 }
 
 func cloudPlatformAPILoadBalancerIPs(cfg RenderConfig) (interface{}, error) {
-	ips, err := cloudPlatformLoadBalancerIPs(cfg, apiLB)
-	if err != nil {
-		return nil, err
-	}
-	ipsClone := slices.Clone(ips.([]configv1.IP))
-	slices.Sort(ipsClone)
-	return ipsClone, nil
+	return cloudPlatformSortedLoadBalancerIPs(cfg, apiLB)
 }
 
 func cloudPlatformIngressLoadBalancerIPs(cfg RenderConfig) (interface{}, error) {
-	ips, err := cloudPlatformLoadBalancerIPs(cfg, ingressLB)
-	if err != nil {
-		return nil, err
-	}
-	ipsClone := slices.Clone(ips.([]configv1.IP))
-	slices.Sort(ipsClone)
-	return ipsClone, nil
+	return cloudPlatformSortedLoadBalancerIPs(cfg, ingressLB)
 }

Also applies to: 835-841, 847-853

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/template/render.go` around lines 823 - 829, The clone+sort
pattern used after cloudPlatformLoadBalancerIPs (taking ips, err, doing
slices.Clone then slices.Sort and returning) should be extracted into a small
helper (e.g., normalizeIPs or cloneAndSortIPs) to avoid repeating the same logic
in render.go; implement a helper that accepts the returned ips type
([]configv1.IP), performs the slice clone and sort, and returns the sorted
[]configv1.IP, then replace the three inline blocks (the code using ips,
ipsClone and slices.Sort) to call that helper from where
cloudPlatformLoadBalancerIPs is used.
pkg/controller/template/cloudplatform_lb_test.go (1)

87-89: Harden the fixture against slice aliasing side effects.

Lines 87-89 reuse the same backing slice for all three LB fields; cloning per field makes the test more robust against accidental in-place mutation in future changes.

🧪 Suggested test hardening
+import "slices"
 ...
-	lbConfig.ClusterHosted.APIIntLoadBalancerIPs = ips
-	lbConfig.ClusterHosted.APILoadBalancerIPs = ips
-	lbConfig.ClusterHosted.IngressLoadBalancerIPs = ips
+	lbConfig.ClusterHosted.APIIntLoadBalancerIPs = slices.Clone(ips)
+	lbConfig.ClusterHosted.APILoadBalancerIPs = slices.Clone(ips)
+	lbConfig.ClusterHosted.IngressLoadBalancerIPs = slices.Clone(ips)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/template/cloudplatform_lb_test.go` around lines 87 - 89, The
test currently assigns the same backing slice to three fields
(lbConfig.ClusterHosted.APIIntLoadBalancerIPs, APILoadBalancerIPs,
IngressLoadBalancerIPs) which risks aliasing; change each assignment to use a
cloned slice (e.g., append([]string(nil), ips...) or make+copy) so each field
gets its own backing array, preventing in-place mutations of one field from
affecting the others during the tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/template/cloudplatform_lb_test.go`:
- Around line 87-89: The test currently assigns the same backing slice to three
fields (lbConfig.ClusterHosted.APIIntLoadBalancerIPs, APILoadBalancerIPs,
IngressLoadBalancerIPs) which risks aliasing; change each assignment to use a
cloned slice (e.g., append([]string(nil), ips...) or make+copy) so each field
gets its own backing array, preventing in-place mutations of one field from
affecting the others during the tests.

In `@pkg/controller/template/render.go`:
- Around line 823-829: The clone+sort pattern used after
cloudPlatformLoadBalancerIPs (taking ips, err, doing slices.Clone then
slices.Sort and returning) should be extracted into a small helper (e.g.,
normalizeIPs or cloneAndSortIPs) to avoid repeating the same logic in render.go;
implement a helper that accepts the returned ips type ([]configv1.IP), performs
the slice clone and sort, and returns the sorted []configv1.IP, then replace the
three inline blocks (the code using ips, ipsClone and slices.Sort) to call that
helper from where cloudPlatformLoadBalancerIPs is used.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: efa5ff8c-f5db-4ecb-9897-297ac10052d6

📥 Commits

Reviewing files that changed from the base of the PR and between c3a9db7 and ccf282d.

📒 Files selected for processing (2)
  • pkg/controller/template/cloudplatform_lb_test.go
  • pkg/controller/template/render.go

Add a test that checks to see if there are multiple LB IPs, they
are returned in a predictable order so that there is no machine-config
controller churn.
@sadasu sadasu force-pushed the cloud-custom-dns branch from ccf282d to 886031d Compare April 22, 2026 15:05
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controller/template/render.go (1)

820-825: Strengthen type-safety by removing interface{} from IP sorting path.

This helper currently depends on an unchecked assertion and can panic if the upstream return type ever drifts. Prefer typed []configv1.IP end-to-end.

Proposed refactor
-func cloudPlatformLoadBalancerIPs(cfg RenderConfig, lbType LoadBalancerType) (interface{}, error) {
+func cloudPlatformLoadBalancerIPs(cfg RenderConfig, lbType LoadBalancerType) ([]configv1.IP, error) {

-func cloneAndSortIPs(ips interface{}) []configv1.IP {
-	ipsSorted := slices.Clone(ips.([]configv1.IP))
+func cloneAndSortIPs(ips []configv1.IP) []configv1.IP {
+	ipsSorted := slices.Clone(ips)
 	slices.Sort(ipsSorted)
 	return ipsSorted
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/template/render.go` around lines 820 - 825, The
cloneAndSortIPs helper currently accepts an interface{} and does an unchecked
type assertion which can panic; change its signature to accept a typed
[]configv1.IP (func cloneAndSortIPs(ips []configv1.IP) []configv1.IP), remove
the assertion inside, perform slices.Clone and slices.Sort directly on the typed
slice, and then update all callers to pass a []configv1.IP rather than
interface{} (adjust call sites that currently pass generic values or forward
upstream returns to ensure they return the concrete []configv1.IP type).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/template/render.go`:
- Around line 820-825: The cloneAndSortIPs helper currently accepts an
interface{} and does an unchecked type assertion which can panic; change its
signature to accept a typed []configv1.IP (func cloneAndSortIPs(ips
[]configv1.IP) []configv1.IP), remove the assertion inside, perform slices.Clone
and slices.Sort directly on the typed slice, and then update all callers to pass
a []configv1.IP rather than interface{} (adjust call sites that currently pass
generic values or forward upstream returns to ensure they return the concrete
[]configv1.IP type).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 95704a49-5dfe-44e7-8182-508b79a0c65a

📥 Commits

Reviewing files that changed from the base of the PR and between ccf282d and 886031d.

📒 Files selected for processing (2)
  • pkg/controller/template/cloudplatform_lb_test.go
  • pkg/controller/template/render.go

@sadasu sadasu force-pushed the cloud-custom-dns branch from 886031d to 33aeb6e Compare April 22, 2026 16:20
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controller/template/render.go (1)

830-855: Prefer typed helper returns over unchecked assertions.

ips.([]configv1.IP) can panic if cloudPlatformLoadBalancerIPs ever changes concrete return type. Returning []configv1.IP directly from the helper makes this path compile-time safe.

Proposed refactor
-func cloudPlatformLoadBalancerIPs(cfg RenderConfig, lbType LoadBalancerType) (interface{}, error) {
+func cloudPlatformLoadBalancerIPs(cfg RenderConfig, lbType LoadBalancerType) ([]configv1.IP, error) {
 	...
 }

 func cloudPlatformAPIIntLoadBalancerIPs(cfg RenderConfig) (interface{}, error) {
 	ips, err := cloudPlatformLoadBalancerIPs(cfg, apiIntLB)
 	if err != nil {
 		return nil, err
 	}
-	return cloneAndSortIPs(ips.([]configv1.IP)), nil
+	return cloneAndSortIPs(ips), nil
 }

 func cloudPlatformAPILoadBalancerIPs(cfg RenderConfig) (interface{}, error) {
 	ips, err := cloudPlatformLoadBalancerIPs(cfg, apiLB)
 	if err != nil {
 		return nil, err
 	}
-	return cloneAndSortIPs(ips.([]configv1.IP)), nil
+	return cloneAndSortIPs(ips), nil
 }

 func cloudPlatformIngressLoadBalancerIPs(cfg RenderConfig) (interface{}, error) {
 	ips, err := cloudPlatformLoadBalancerIPs(cfg, ingressLB)
 	if err != nil {
 		return nil, err
 	}
-	return cloneAndSortIPs(ips.([]configv1.IP)), nil
+	return cloneAndSortIPs(ips), nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/template/render.go` around lines 830 - 855, The helper
cloudPlatformLoadBalancerIPs should return a concrete type ([]configv1.IP,
error) instead of interface{} so callers don't need unchecked assertions; change
cloudPlatformLoadBalancerIPs signature and its internal returns to
([]configv1.IP, error), update callers cloudPlatformAPILoadBalancerIPs,
cloudPlatformIngressLoadBalancerIPs and the other caller that passes
apiIntLB/apiLB/ingressLB to accept that typed return, and remove the
ips.([]configv1.IP) assertions—pass the typed slice directly into
cloneAndSortIPs (which expects []configv1.IP) and propagate errors as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/template/render.go`:
- Around line 830-855: The helper cloudPlatformLoadBalancerIPs should return a
concrete type ([]configv1.IP, error) instead of interface{} so callers don't
need unchecked assertions; change cloudPlatformLoadBalancerIPs signature and its
internal returns to ([]configv1.IP, error), update callers
cloudPlatformAPILoadBalancerIPs, cloudPlatformIngressLoadBalancerIPs and the
other caller that passes apiIntLB/apiLB/ingressLB to accept that typed return,
and remove the ips.([]configv1.IP) assertions—pass the typed slice directly into
cloneAndSortIPs (which expects []configv1.IP) and propagate errors as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 83597f20-3d4b-4705-ab5a-8cfcbbeeeb11

📥 Commits

Reviewing files that changed from the base of the PR and between 886031d and 33aeb6e.

📒 Files selected for processing (1)
  • pkg/controller/template/render.go

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 22, 2026

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-custom-dns-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@sadasu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-custom-dns-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b103b3c0-3e7d-11f1-8694-80806987c267-0

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Apr 22, 2026

/test unit

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Apr 22, 2026

/test e2e-aws-ovn e2e-aws-ovn-upgrade e2e-gcp-op-part1 e2e-gcp-op-part2 e2e-gcp-op-single-node e2e-hypershift

Comment thread pkg/controller/template/render.go
Return sorted list of Cloud Load Balancer IPs so that changes in order
do not result in the CoreDNS pod yaml to be regenerated and cause
machine config churn.
Note that this error occurs only when the ClusterHostedDNS feature
is enabled and not during regular AWS installs.
@sadasu sadasu force-pushed the cloud-custom-dns branch from 33aeb6e to fb3a965 Compare April 22, 2026 21:17
@tthvo
Copy link
Copy Markdown
Member

tthvo commented Apr 22, 2026

/test e2e-aws-ovn e2e-aws-ovn-upgrade e2e-gcp-op-part1 e2e-gcp-op-part2 e2e-gcp-op-single-node e2e-hypershift

Copy link
Copy Markdown
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 22, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sadasu, tthvo
Once this PR has been reviewed and has the lgtm label, please assign isabella-janssen for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 22, 2026

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-custom-dns-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@sadasu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-custom-dns-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4c40d2a0-3e92-11f1-8111-9fc7e68a182d-0

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 23, 2026

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-custom-dns-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

@sadasu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-custom-dns-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f261e3a0-3eaa-11f1-8fa0-6a065099d827-0

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 23, 2026

Install-config contained multiple zones:

featureSet: TechPreviewNoUpgrade
controlPlane:
  platform:
    aws:
      zones:
      - us-east-1f
      - us-east-1a
      type: m6a.2xlarge
  architecture: amd64
  name: master
  replicas: 3
compute:
- platform:
    aws:
      zones:
      - us-east-1f
      - us-east-1a
      type: m6a.2xlarge
  architecture: amd64
  name: worker
  replicas: 3

Infrastructure resource contained multiple LB IPs. ClusterHosted Load Balancer IPs:

cloudLoadBalancerConfig:                                                                                                                            
   dnsType: ClusterHosted                                                                                                                            
   clusterHosted:                                                                                                                                    
     apiIntLoadBalancerIPs:
     - 10.0.18.7                                                                                                                                     
     - 10.0.79.44                                          
     apiLoadBalancerIPs:
     - 98.86.52.193
     - 52.206.120.100
     ingressLoadBalancerIPs:                                                                                                                         
     - 52.21.128.239
     - 52.70.209.201        

The master nodes were rebooted 5 times based on the boot IDs in the kubelet service logs:

  1. Boot 1 (da26e67fa230433fbc3f6209cb20eeae): ~00:57:17 - Initial/bootstrap boot
  2. Boot 2 (1741f5a98dbb447f90b8d47aa9b4872f): ~00:59:00 - 2 minutes later
  3. Boot 3 (9e39e375c08147e9b13a69a292138006): ~01:09:22 - 10 minutes later
  4. Boot 4 (d42265c435f44b06b632dad91db1af8a): ~01:16:35 - 7 minutes later (This is the CoreDNS.yaml update)
  5. Boot 5 (5d97f91b79ec49ea8a9b97635f8f8c90): ~01:27:04 - 11 minutes later

And only one update to CoreDNS.yaml was made hence proving that sorting the IPs before rendering them ensured that IP addresses were not returned in random order.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 23, 2026

/verified by CI

See #5870 (comment) and #5870 (comment)

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 23, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@sadasu: This PR has been marked as verified by CI.

Details

In response to this:

/verified by CI

See #5870 (comment) and #5870 (comment)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 23, 2026

/retest

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 24, 2026

/test e2e-hypershift

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 24, 2026

@sadasu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Apr 28, 2026

@isabella-janssen , @cheesesashimi Could you PTAL?


// cloneAndSortIPs clones and sorts IP addresses to ensure consistent ordering.
func cloneAndSortIPs(ips []configv1.IP) []configv1.IP {
ipsSorted := slices.Clone(ips)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we 100% sure about this change? I mean, order in DNS is important and discarding the original order and sort them numerically totally drops DNS NSs priority.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this thought. In fact, the tests that I proposed in #5840 ensure that the MCO writes the configs while preserving the order in which the input was given.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants