Skip to content

fix: adjusting SortHosts to sort descending for two-node topologies#10249

Draft
fracappa wants to merge 1 commit intoopenshift:masterfrom
fracappa:OCPBUGS-81499/fix-sort-hosts-two-node-topology
Draft

fix: adjusting SortHosts to sort descending for two-node topologies#10249
fracappa wants to merge 1 commit intoopenshift:masterfrom
fracappa:OCPBUGS-81499/fix-sort-hosts-two-node-topology

Conversation

@fracappa
Copy link
Copy Markdown

@fracappa fracappa commented Apr 28, 2026

When workers have resources similiar to arbiter/masters, the ascending sort causes the least capable hosts to be greedly assigned as masters. For two-node topologies where masters are schedulable, the most capable hosts should be the masters instead. Sort descending when ControlPlaneCount is 2 so the beefiest nodes get master roles and the lightest one becomes the arbiters.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Fixed host role assignment during cluster installation to use correct sorted host ordering.
    • Enhanced two-node topology support to properly assign arbiter and worker roles based on host capabilities.
    • Improved host sorting logic to ensure role assignment aligns with topology configuration.

When workers have resources similiar to arbiter/masters, the ascending sort
causes the least capable hosts to be greedly assigned as masters. For
two-node topologies where masters are schedulable, the most capable hosts
should be the masters instead. Sort descending when ControlPlaneCount is 2
so the beefiest nodes get master roles and the lightest one becomes the arbiters.
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 28, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 28, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Walkthrough

These changes modify the host-role auto-assignment logic during cluster installation to account for control plane count, particularly for two-node arbiter topologies. The SortHosts function now accepts controlPlaneCount as a parameter and adjusts host weight sorting accordingly. The role assignment loop is corrected to use sorted host objects rather than indexing into the original unsorted list.

Changes

Cohort / File(s) Summary
Host Sorting Logic
internal/host/monitor.go
Updated SortHosts function signature to accept controlPlaneCount parameter. Added isTwoNodeTopology derivation logic that reverses weight comparison results for two-node topologies, and includes TNA-specific logic to relocate the least capable non-GPU host to position controlPlaneCount. Monitoring loop updated to pass c.ControlPlaneCount into SortHosts.
Role Assignment
internal/bminventory/inventory.go
Auto-role assignment during InstallClusterInternal now derives sorted host list using cluster's ControlPlaneCount. Fixed assignment loop to iterate over sorted host objects from sortedHosts rather than indexing into the original unsorted cluster.Hosts.
Test Updates
internal/host/host_test.go
Updated existing host-role auto-assignment and host-sorting tests to pass explicit topology/master-count parameter to SortHosts. Added new TNA day-0 test verifying arbiter mode behavior produces exactly 2 masters, 1 arbiter, and 2 workers. Extended sortHostByHardware coverage with two-node arbiter mode expected ordering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
Stable And Deterministic Test Names ❌ Error Test on line 2214 of internal/host/host_test.go uses dynamic test name generation with fmt.Sprintf, violating static test name requirements. Replace dynamic test name generation with static descriptive titles like 'It("SetBootstrap to true", func() {' and 'It("SetBootstrap to false", func() {'.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is mostly complete but lacks critical information about issue linkage, detailed testing methodology, and commit message confirmation. Link to the related issue (OCPBUGS-81499), specify how the change was tested in dev-scripts, confirm title/description in commit, and explain why unit tests weren't added despite code changes.
✅ Passed checks (9 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adjusting SortHosts to sort descending for two-node topologies, which matches the core functionality change in monitor.go.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Structure And Quality ✅ Passed New tests demonstrate good quality practices with meaningful assertion messages, proper isolation by role type, established Ginkgo patterns, context/timeouts, and consistent fixture generation.
Microshift Test Compatibility ✅ Passed The three new test cases identified are unit tests in internal/host/host_test.go, not e2e tests, and use mocks and fixtures rather than live cluster APIs.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. The modified test file is a standard Go unit test, so the SNO compatibility check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed Changes modify assisted-service provisioning logic (host sorting) not deployment manifests or operator code deployed to clusters; already implements topology-awareness for TNA scenarios.
Ote Binary Stdout Contract ✅ Passed The pull request does not violate the OTE Binary Stdout Contract. No stdout writes were introduced in process-level code; changes are isolated to utility functions and test code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed New tests in internal/host/host_test.go contain no hardcoded IPv4 addresses, external URLs, or external connectivity requirements.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fracappa
Once this PR has been reviewed and has the lgtm label, please assign crystalchun for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/host/host_test.go (1)

3231-3231: Prefer scenario-driven control plane count over a hardcoded HA constant.

Line 3231 and Line 3264 currently hardcode HA mode. Using cluster.ControlPlaneCount keeps these tests aligned with their own setup and avoids future drift.

♻️ Suggested refactor
-		sortedHosts, _ := SortHosts(hosts, common.MinMasterHostsNeededForInstallationInHaMode)
+		sortedHosts, _ := SortHosts(hosts, cluster.ControlPlaneCount)
-		sortedHosts, _ := SortHosts(hosts, common.MinMasterHostsNeededForInstallationInHaMode)
+		sortedHosts, _ := SortHosts(hosts, cluster.ControlPlaneCount)

Also applies to: 3264-3264

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/host/host_test.go` at line 3231, The test uses a hardcoded HA
constant when calling SortHosts (sortedHosts, _ := SortHosts(hosts,
common.MinMasterHostsNeededForInstallationInHaMode)); replace that constant with
the scenario's actual control plane count (e.g., use cluster.ControlPlaneCount
or whatever test fixture variable holds the ControlPlaneCount) so the SortHosts
calls use the test's configured control plane count; update both occurrences
(the call assigning sortedHosts and the other call at the other location) to
pass the scenario-driven ControlPlaneCount instead of
common.MinMasterHostsNeededForInstallationInHaMode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/host/host_test.go`:
- Line 3231: The test uses a hardcoded HA constant when calling SortHosts
(sortedHosts, _ := SortHosts(hosts,
common.MinMasterHostsNeededForInstallationInHaMode)); replace that constant with
the scenario's actual control plane count (e.g., use cluster.ControlPlaneCount
or whatever test fixture variable holds the ControlPlaneCount) so the SortHosts
calls use the test's configured control plane count; update both occurrences
(the call assigning sortedHosts and the other call at the other location) to
pass the scenario-driven ControlPlaneCount instead of
common.MinMasterHostsNeededForInstallationInHaMode.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5a466c57-b6ea-4463-ac37-73b8aacf38fd

📥 Commits

Reviewing files that changed from the base of the PR and between 26f1247 and 10848b2.

📒 Files selected for processing (3)
  • internal/bminventory/inventory.go
  • internal/host/host_test.go
  • internal/host/monitor.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant