Skip to content

Conversation

@spinler
Copy link
Contributor

@spinler spinler commented Mar 13, 2025

During some bad path testing the sibling daemon on each BMC would make it past the existing check done to make sure it was running and then die. This would cause the wait for the sibling interface to be on D-Bus to time out. At that point each BMC became active since it thought the sibling daemon was fine and just the sibling BMC had the problem.

Fix this by checking again if the sibling daemon is running when the sibling interface still isn't on D-Bus after waiting for it. If it isn't, become passive.

Tested:

This is seen on each BMC:

Waiting for sibling interface and/or heartbeat: Present = False, Heartbeat = False
Done waiting for sibling. Interface present = False, heartbeat = False
Sibling service state is failed
Role = xyz.openbmc_project.State.BMC.Redundancy.Role.Passive due to: Sibling BMC service is not running

During some bad path testing the sibling daemon on each BMC would make
it past the existing check done to make sure it was running and then
die. This would cause the wait for the sibling interface to be on D-Bus
to time out.  At that point each BMC became active since it thought the
sibling daemon was fine and just the sibling BMC had the problem.

Fix this by checking again if the sibling daemon is running when the
sibling interface still isn't on D-Bus after waiting for it.  If it
isn't, become passive.

Tested:

This is seen on each BMC:

```
Waiting for sibling interface and/or heartbeat: Present = False, Heartbeat = False
Done waiting for sibling. Interface present = False, heartbeat = False
Sibling service state is failed
Role = xyz.openbmc_project.State.BMC.Redundancy.Role.Passive due to: Sibling BMC service is not running
```

Change-Id: Id065e547462ea1c944e00c127552cc4642ce6744
Signed-off-by: Matt Spinler <[email protected]>
@spinler spinler force-pushed the 1120_dead_sibling_check branch from d4c9a12 to 3504e73 Compare July 16, 2025 16:22
@spinler spinler requested a review from RameshIyyar July 16, 2025 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants