Skip to content

Commit 8f04a2d

Browse files
committed
RBMC: Check again for dead sibling service
During some bad path testing the sibling daemon on each BMC would make it past the existing check done to make sure it was running and then die. This would cause the wait for the sibling interface to be on D-Bus to time out. At that point each BMC became active since it thought the sibling daemon was fine and just the sibling BMC had the problem. Fix this by checking again if the sibling daemon is running when the sibling interface still isn't on D-Bus after waiting for it. If it isn't, become passive. Tested: This is seen on each BMC: ``` Waiting for sibling interface and/or heartbeat: Present = False, Heartbeat = False Done waiting for sibling. Interface present = False, heartbeat = False Sibling service state is failed Role = xyz.openbmc_project.State.BMC.Redundancy.Role.Passive due to: Sibling BMC service is not running ``` Signed-off-by: Matt Spinler <[email protected]>
1 parent 03fd466 commit 8f04a2d

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

redundant-bmc/src/manager.cpp

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,6 @@ Manager::Manager(sdbusplus::async::context& ctx,
4949
ctx.spawn(startup());
5050
}
5151

52-
// clang-tidy currently mangles this into something unreadable
5352
// NOLINTNEXTLINE
5453
sdbusplus::async::task<> Manager::startup()
5554
{
@@ -72,13 +71,20 @@ sdbusplus::async::task<> Manager::startup()
7271
{
7372
co_await sibling->waitForSiblingUp(siblingTimeout);
7473

75-
if (previousRole == Role::Passive)
74+
// Sibling service may have died. Check again.
75+
if (!sibling->getInterfacePresent())
76+
{
77+
passiveRoleInfo = co_await determinePassiveRoleIfRequired();
78+
}
79+
80+
// If passive previously, let sibling go first.
81+
if (!passiveRoleInfo && (previousRole == Role::Passive))
7682
{
7783
co_await sibling->waitForSiblingRole();
7884
}
7985
}
8086

81-
updateRole(determineRole());
87+
updateRole(passiveRoleInfo.value_or(determineRole()));
8288
}
8389

8490
spawnRoleHandler();
@@ -118,7 +124,6 @@ void Manager::startHeartbeat()
118124
ctx.spawn(doHeartBeat());
119125
}
120126

121-
// clang-tidy currently mangles this into something unreadable
122127
// NOLINTNEXTLINE
123128
sdbusplus::async::task<> Manager::doHeartBeat()
124129
{

0 commit comments

Comments
 (0)