[release-ocm-2.14] MGMT-23971: Add a timeout for installing-pending-user-action#10260
Conversation
Hosts in `installing-pending-user-action` were stalling entire clusters even if the cluster would install fine without them. This commit adds a timeout for hosts in this state so that the cluster can succeed when the required minimum number of nodes have already installed. This is especially important in cases where very large clusters are being installed (~100 nodes). In these kinds of cases, one or two worker nodes shouldn't force the user to reinstall the entire thing if they don't want to monitor the multi-hour install process for hosts failing to reboot. Resolves https://redhat.atlassian.net/browse/MGMT-23971 Assisted-By: Claude Code
The mapping was slightly more efficient, but for small lists of hosts and small lists of statuses this change is fine and much easier to read.
These will be needed so that the hosts package can evaluate if a cluster is ready to move install states
Without this it would be possibly for a host to time out in installing-pending-user-action when it was a host that was required for the entire cluster to succeed (for example, control plane nodes, or workers in non-compact clusters). If enough other hosts have installed for the cluster to finish _and_ the host has spent over an hour in pending-user-action then it will time out to allow the user to use the cluster.
|
@carbonin: This pull request references MGMT-23971 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: carbonin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release-ocm-2.14 #10260 +/- ##
=================================================
Coverage 42.87% 42.88%
=================================================
Files 380 380
Lines 67872 67887 +15
=================================================
+ Hits 29102 29111 +9
- Misses 36150 36152 +2
- Partials 2620 2624 +4
🚀 New features to boost your workflow:
|
|
/test subsystem-aws |
|
/lgtm |
|
/test e2e-ai-operator-ztp |
|
@carbonin: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
18c02d3
into
openshift:release-ocm-2.14
This is a manual cherry-pick of #10202