Skip to content

Commit d4c9a12

Browse files
committed
RBMC: Check again for dead sibling service
During some bad path testing the sibling daemon on each BMC would make it past the existing check done to make sure it was running and then die. This would cause the wait for the sibling interface to be on D-Bus to time out. At that point each BMC became active since it thought the sibling daemon was fine and just the sibling BMC had the problem. Fix this by checking again if the sibling daemon is running when the sibling interface still isn't on D-Bus after waiting for it. If it isn't, become passive. Tested: This is seen on each BMC: ``` Waiting for sibling interface and/or heartbeat: Present = False, Heartbeat = False Done waiting for sibling. Interface present = False, heartbeat = False Sibling service state is failed Role = xyz.openbmc_project.State.BMC.Redundancy.Role.Passive due to: Sibling BMC service is not running ``` Signed-off-by: Matt Spinler <[email protected]>
1 parent eb92044 commit d4c9a12

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

redundant-bmc/src/manager.cpp

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@ Manager::Manager(sdbusplus::async::context& ctx,
5454
ctx.spawn(startup());
5555
}
5656

57-
// clang-tidy currently mangles this into something unreadable
5857
// NOLINTNEXTLINE
5958
sdbusplus::async::task<> Manager::startup()
6059
{
@@ -80,13 +79,20 @@ sdbusplus::async::task<> Manager::startup()
8079
{
8180
co_await sibling.waitForSiblingUp();
8281

83-
if (previousRole == Role::Passive)
82+
// Sibling service may have died. Check again.
83+
if (!sibling->getInterfacePresent())
84+
{
85+
passiveRoleInfo = co_await determinePassiveRoleIfRequired();
86+
}
87+
88+
// If passive previously, let sibling go first.
89+
if (!passiveRoleInfo && (previousRole == Role::Passive))
8490
{
8591
co_await sibling.waitForSiblingRole();
8692
}
8793
}
8894

89-
updateRole(determineRole());
95+
updateRole(passiveRoleInfo.value_or(determineRole()));
9096
}
9197

9298
spawnRoleHandler();
@@ -126,7 +132,6 @@ void Manager::startHeartbeat()
126132
ctx.spawn(doHeartBeat());
127133
}
128134

129-
// clang-tidy currently mangles this into something unreadable
130135
// NOLINTNEXTLINE
131136
sdbusplus::async::task<> Manager::doHeartBeat()
132137
{

0 commit comments

Comments
 (0)