Skip to content

NO-ISSUE: installing-pending-user-action timeout docs update#10252

Open
carbonin wants to merge 1 commit intoopenshift:masterfrom
carbonin:pending-docs-update
Open

NO-ISSUE: installing-pending-user-action timeout docs update#10252
carbonin wants to merge 1 commit intoopenshift:masterfrom
carbonin:pending-docs-update

Conversation

@carbonin
Copy link
Copy Markdown
Member

@carbonin carbonin commented Apr 28, 2026

Document the timeout behavior in installing-pending-user-action

Followup to #10202

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see [CONTRIBUTING] guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Summary by CodeRabbit

  • Documentation
    • Updated and expanded installation state documentation with comprehensive timeout behavior guide, including configuration options, timeout mechanisms, and cluster recovery paths.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 28, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@carbonin: This pull request explicitly references no jira issue.

Details

In response to this:

Document the timeout behavior in installing-pending-user-action

Followup to #10202

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see [CONTRIBUTING] guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Walkthrough

Documentation updates add timeout/error outcome to the Installing-pending-user-action host state in the architecture guide, and introduce comprehensive documentation describing install-state timeout mechanisms, host-level and cluster-level timeout handling, and configurability options.

Changes

Cohort / File(s) Summary
Installation State Documentation
docs/architecture.md, docs/installing-pending-user-action-timeout-behavior.md
Updated architecture documentation to clarify timeout behavior for the Installing-pending-user-action state; added new guide documenting entry mechanisms (wrong boot order detection, reboot-stage timeout), host-level timeout handling (60-minute default), cluster-level timeout behavior (24-hour installation timeout), and smart timeout viability checks with configurable parameters.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change as a documentation update for the installing-pending-user-action timeout behavior, and appropriately uses NO-ISSUE prefix to indicate no associated JIRA issue.
Description check ✅ Passed The PR description follows the template structure, correctly marks Documentation as the change type, sets testing appropriately to No tests needed, and provides context as a follow-up to PR #10202.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed This PR is a documentation-only change containing no Ginkgo test files or test code.
Test Structure And Quality ✅ Passed PR is purely documentation-focused with no Ginkgo test code modifications, so check requirements are not applicable.
Microshift Test Compatibility ✅ Passed This PR only updates markdown documentation files without adding any Ginkgo e2e tests or Go test code, so the custom check for MicroShift-incompatible APIs is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This pull request is documentation-only and does not add any Ginkgo e2e tests, so SNO test compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed This PR is a documentation-only change in the docs/ directory with no code modifications, deployment manifests, operator implementations, or scheduling-related logic.
Ote Binary Stdout Contract ✅ Passed PR contains only documentation updates to markdown files with no code modifications, making the stdout contract check inapplicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR contains only documentation changes with no new Ginkgo e2e tests, so the check is not applicable and passes by default.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 28, 2026
@openshift-ci openshift-ci Bot requested review from bluesort and eliorerz April 28, 2026 20:12
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 28, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
docs/installing-pending-user-action-timeout-behavior.md (2)

31-34: Add language identifier to fenced code block.

For better markdown compliance, consider adding a language identifier to the fenced code block. Since this is a user-facing message rather than code, you can use text:

📝 Suggested improvement
-```
+```text
 Expected the host to boot from disk, but it booted the installation image - please 
 reboot and fix boot order to boot from disk [Model] [Serial] (disk-name, disk-id)

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/installing-pending-user-action-timeout-behavior.md around lines 31 -
34, Add the language identifier "text" to the fenced code block that contains
the user-facing message "Expected the host to boot from disk, but it booted the
installation image - please reboot and fix boot order to boot from disk [Model]
[Serial] (disk-name, disk-id)" so the markdown fence becomes text ... to
improve markdown compliance and rendering; locate the fenced block in
docs/installing-pending-user-action-timeout-behavior.md and update the opening
fence accordingly.


</details>

---

`66-71`: **Add language identifier to fenced code block.**

For better markdown compliance, consider adding a language identifier to the fenced code block. Since this is a user-facing message, you can use `text`:

<details>
<summary>📝 Suggested improvement</summary>

```diff
-```
+```text
 Host timed out when pulling the configuration files. Verify in the host console 
 that the host boots from the OpenShift installation disk (disk-name, disk-id) and 
 has network access to the cluster API. The installation will resume after the host 
 successfully boots and can access the cluster API
 ```
```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/installing-pending-user-action-timeout-behavior.md around lines 66 -
71, Add a language identifier to the fenced code block containing the Host timed
out message by changing the opening triple backticks to text so the block begins with text and ends with ```; this ensures proper markdown highlighting
and compliance for the multi-line message (the fenced code block with the host
timeout instructions).


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @docs/installing-pending-user-action-timeout-behavior.md:

  • Around line 254-257: Update the documentation references to point at the
    correct locations and function names: replace the incorrect cluster validation
    reference to enoughMastersAndWorkers with HasEnoughMastersAndWorkers and
    update its path to internal/common/common.go:939, and change the host counting
    reference for HostsInStatus to internal/common/common.go:916; ensure lines
    254-257 mention HostsInStatus at internal/common/common.go:916 and
    HasEnoughMastersAndWorkers at internal/common/common.go:939 so the doc links
    match the codebase.
  • Line 63: The docs reference only part of the reboot timeout transition; update
    the code reference range in the documentation so it covers the entire
    TransitionRule for the reboot timeout (including its closing brace) rather than
    stopping before it—locate the TransitionRule named/related to the reboot timeout
    in the statemachine and extend the documented line range so the full rule is
    included.

Nitpick comments:
In @docs/installing-pending-user-action-timeout-behavior.md:

  • Around line 31-34: Add the language identifier "text" to the fenced code block
    that contains the user-facing message "Expected the host to boot from disk, but
    it booted the installation image - please reboot and fix boot order to boot from
    disk [Model] [Serial] (disk-name, disk-id)" so the markdown fence becomes
    text ... to improve markdown compliance and rendering; locate the fenced
    block in docs/installing-pending-user-action-timeout-behavior.md and update the
    opening fence accordingly.
  • Around line 66-71: Add a language identifier to the fenced code block
    containing the Host timed out message by changing the opening triple backticks
    to text so the block begins with text and ends with ```; this ensures
    proper markdown highlighting and compliance for the multi-line message (the
    fenced code block with the host timeout instructions).

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository: openshift/coderabbit/.coderabbit.yaml

**Review profile**: CHILL

**Plan**: Enterprise

**Run ID**: `56490c98-0f24-4cca-8785-6c9ce5d59163`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between c9164d448f971c1cc3c9aad3cee04c9b833f6cab and ad5ad5044db55f800a5c9c14448fe856fae803b1.

</details>

<details>
<summary>⛔ Files ignored due to path filters (2)</summary>

* `docs/HostStatus.dot` is excluded by `!**/*.dot`
* `docs/HostStatus.png` is excluded by `!**/*.png`

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `docs/architecture.md`
* `docs/installing-pending-user-action-timeout-behavior.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread docs/installing-pending-user-action-timeout-behavior.md
Comment on lines +254 to +257
**Code References:**
- Host timeout transition: `internal/host/statemachine.go` (HasPendingUserActionTimedOut condition)
- Cluster validation: `internal/cluster/transition.go:517-554` (enoughMastersAndWorkers function)
- Host counting: `internal/cluster/common.go:192-206` (HostsInStatus function)
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify the locations of viability check functions

# Search for enoughMastersAndWorkers function
echo "=== Searching for enoughMastersAndWorkers ==="
rg -n 'func.*enoughMastersAndWorkers' internal/

# Search for HostsInStatus function  
echo "=== Searching for HostsInStatus ==="
rg -n 'func.*HostsInStatus' internal/

# Search for HasEnoughMastersAndWorkers function
echo "=== Searching for HasEnoughMastersAndWorkers ==="
rg -n 'func.*HasEnoughMastersAndWorkers' internal/

Repository: openshift/assisted-service

Length of output: 416


🏁 Script executed:

# Check if enoughMastersAndWorkers exists in transition.go
echo "=== Checking transition.go for enoughMastersAndWorkers ==="
rg -A 5 'enoughMastersAndWorkers' internal/cluster/transition.go

# Check what's actually at lines 517-554 in transition.go
echo "=== Lines 517-554 in transition.go ==="
sed -n '517,554p' internal/cluster/transition.go | cat -n

# Check the context around HostsInStatus in common.go
echo "=== Lines 910-925 in common.go (around HostsInStatus) ==="
sed -n '910,925p' internal/common/common.go | cat -n

# Verify if transition.go calls HasEnoughMastersAndWorkers or similar
echo "=== Searching transition.go for HasEnoughMastersAndWorkers usage ==="
rg -B 2 -A 2 'HasEnoughMastersAndWorkers' internal/cluster/transition.go

Repository: openshift/assisted-service

Length of output: 3041


Correct the code reference paths for cluster viability checks.

The documentation references incorrect file paths and function names:

  • enoughMastersAndWorkers function does not exist; should reference HasEnoughMastersAndWorkers
  • HostsInStatus is located at internal/common/common.go:916, not internal/cluster/common.go:192-206
  • HasEnoughMastersAndWorkers is located at internal/common/common.go:939, not internal/cluster/transition.go:517-554

Update lines 254-257 to reference:

  • Host counting: internal/common/common.go:916 (HostsInStatus function)
  • Cluster validation: internal/common/common.go:939 (HasEnoughMastersAndWorkers function)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/installing-pending-user-action-timeout-behavior.md` around lines 254 -
257, Update the documentation references to point at the correct locations and
function names: replace the incorrect cluster validation reference to
`enoughMastersAndWorkers` with `HasEnoughMastersAndWorkers` and update its path
to `internal/common/common.go:939`, and change the host counting reference for
`HostsInStatus` to `internal/common/common.go:916`; ensure lines 254-257 mention
`HostsInStatus` at internal/common/common.go:916 and
`HasEnoughMastersAndWorkers` at internal/common/common.go:939 so the doc links
match the codebase.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rabbit has a point, the function moved :)

Code references in general are very hard to maintain, I would discourage their use, unless we find some method that would avoid the staleness issue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rccrdpccl, totally agree — line-number references in docs are notoriously fragile. A few approaches that can reduce (though not eliminate) the staleness problem:

  1. GitHub permalink to a specific commit SHA — Instead of internal/foo/bar.go:123, link to https://github.com/openshift/assisted-service/blob/<SHA>/internal/foo/bar.go#L123. The line stays frozen at that commit, so it's never silently wrong — though it does go stale in the sense that it won't track future refactors.

  2. Reference symbols/function names only, no line numbers — Prose like "see HasEnoughMastersAndWorkers in internal/common/common.go" is far more resilient than line numbers because symbol renames are usually grep-able, whereas line shifts are invisible.

  3. Automated link-rot check — A CI step (e.g., a small script) that greps for every symbol name mentioned in the doc and fails if any are missing from the codebase. This catches renames at PR time rather than months later.

For this PR, given the docs-only nature, the simplest fix would be to drop line numbers entirely and just reference function/file names. That way the references remain useful without the high maintenance cost.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 28, 2026

@carbonin: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants