-
Notifications
You must be signed in to change notification settings - Fork 126
NO-ISSUE: Extend CRD deletion timeouts #2513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
NO-ISSUE: Extend CRD deletion timeouts #2513
Conversation
Similar to Justin's earlier proposal about Pod-related timeouts, this proposes prolonging timeouts applied to CRD deletion. Most calls to deletion helpers are done for testcase cleanup, and we often see failures in various testcases correlating with periods of high control plane load. This is not an entirely uncommon flake: https://search.dptools.openshift.org/?search=deleting+CustomResourceDefinition%3A+context+deadline+exceeded&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
|
@petr-muller: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
|
@petr-muller: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: petr-muller The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@petr-muller: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Similar to Justin's earlier proposal about Pod-related timeouts (#2497), this proposes prolonging timeouts applied to CRD deletion. Most calls to deletion helpers are done for testcase cleanup, and we often see failures in various testcases correlating with periods of high control plane load.
This is not an entirely uncommon flake: https://search.dptools.openshift.org/?search=deleting+CustomResourceDefinition%3A+context+deadline+exceeded&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
Similarly to #2497, prolonging timeouts could be done downstream or argued upstream, and it is also possible that there may be other ways to reduce control plane load so that these failures stop occurring. It is also possible that the timeouts are unrelated to control plane load and the timeouts are a symptom of a real problem. I do not have solid evidence.
My approach (that I'd propose even for #2497) would be to run a timeboxed experiment with the prolonged timeouts, validate that we'd see a failure reduction we hope for, and then try moving the changes upstream.
/hold