Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(alerts): set severity of 'ectdMembersDown' from 'critical' to 'wa… #19300

Merged
merged 1 commit into from
Jan 31, 2025

Conversation

sebastiangaiser
Copy link
Contributor

…rning'

Downgraded severity of 'etcdMembersDown' from 'critical' to 'warning' as a single etcd member being not available should not be a problem for etcd's quorum. If the quorum would not be fulfilled, 'etcdInsufficientMembers' should fire. In addition the 'for' interval was extended from '10m' to '20m' as e.g. a node reboot with a big physical node takes usually longer than 10 minutes.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@k8s-ci-robot
Copy link

Hi @sebastiangaiser. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@serathius
Copy link
Member

/ok-to-test

Copy link

codecov bot commented Jan 29, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.82%. Comparing base (f5973c9) to head (575b484).
Report is 4 commits behind head on main.

Additional details and impacted files

see 21 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19300      +/-   ##
==========================================
- Coverage   68.85%   68.82%   -0.04%     
==========================================
  Files         420      420              
  Lines       35693    35693              
==========================================
- Hits        24577    24564      -13     
- Misses       9692     9710      +18     
+ Partials     1424     1419       -5     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5973c9...575b484. Read the comment docs.

@sebastiangaiser
Copy link
Contributor Author

sebastiangaiser commented Jan 29, 2025

@serathius thank you for starting the tests.
I tried to address the failing tests but I guess the still failing tests are not related to my changes 🤔

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks

@serathius
Copy link
Member

/retest

@k8s-ci-robot
Copy link

k8s-ci-robot commented Jan 30, 2025

@sebastiangaiser: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-release-tests d1e59fb link false /test pull-etcd-release-tests

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ahrtr
Copy link
Member

ahrtr commented Jan 30, 2025

The failure in pull-etcd-release-tests has already been fixed by @ivanvc in kubernetes/test-infra#34236. But you have to resubmit (e.g git commit --amend, then forcely push it again) the PR to fix it. It's OK to merge this PR directly.

…rning'

Downgraded severity of 'etcdMembersDown' from 'critical' to 'warning' as a single etcd member being not available should not be a problem for etcd's quorum. If the quorum would not be fulfilled, 'etcdInsufficientMembers' should fire. In addition the 'for' interval was extended from '10m' to '20m' as e.g. a node reboot with a big physical node takes usually longer than 10 minutes.

Signed-off-by: Sebastian Gaiser <[email protected]>
@sebastiangaiser
Copy link
Contributor Author

@ahrtr I rebased the PR, so the tests should be green now 😄

@ahrtr
Copy link
Member

ahrtr commented Jan 30, 2025

cc @fuweid @ivanvc @jmhbnz @serathius

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @sebastiangaiser.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, ivanvc, sebastiangaiser

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr ahrtr merged commit df6ecb2 into etcd-io:main Jan 31, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants