Fix lost recovery notifications after recovery outside of notification time period #10187
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is fixed by using
Checkable::GetStateBeforeSuppression()
only where relevant.Not all calls to
Checkable::NotificationReasonApplies()
needGetStateBeforeSuppression()
to be checked. In fact, for one caller,FireSuppressedNotifications()
inlib/notification/notificationcomponent.cpp
, the state before suppression may not even be initialized properly, so that the default value of OK is used which can lead to incorrect return values. Note the difference between suppressions happening on the level of theCheckable
object level and theNotification
object level. Only the first sets the state before suppression in theCheckable
object, but so far, also the latter used that value incorrectly.This commit moves the check of
GetStateBeforeSuppression()
fromCheckable::NotificationReasonApplies()
to the one place where it's actually relevant:Checkable::FireSuppressedNotifications()
. This made the existing call toNotificationReasonApplies()
unneccessary as it would always return true: thetype
argument is computed based on the current check result, so there's no need to check it against the current check result.Tests
I've written a short stand-alone Icinga 2 config file that can be used to reproduce the bug. It used some wizardry you wouldn't want to use in a production config to create a host that starts in a problem state and recovers 60 seconds after startup. In addition to that, it configures a notification with a dynamically generated time period that spans the full current day but has a gab between 30 seconds and 90 seconds after daemon startup. Thus, a sent recovery notification should be logged 90 seconds after daemon startup.
icinga2.conf
It can easily be fired up in a container like this:
docker run --rm -it -v $(pwd)/icinga2.conf:/icinga2.conf:ro icinga/icinga2 timeout 2m icinga2 daemon -c /icinga2.conf
Current master branch
The problem notification is logged, though there's no recovery notification logged around the time when the output switches back to "inside test-period: true".
This PR
Comparison with PR #10032
My rationale for this PR is that it simplifies the code compared to #10032.
Looking purely at the functionality, there should be no difference between the two. Actually, if you take #10032, move the
!(cr->GetState() == GetStateBeforeSuppression() && GetSuppressedNotifications() & stateNotifications)
part proposed there to each call site ofCheckable::NotificationReasonApplies()
and then remove all the redundant checks, you should end up pretty much with this PR.fixes #10025
closes #10032
refs #9207 (introduced the issue)