Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trap INT and TERM in CI #1627

Merged
merged 3 commits into from
May 30, 2020
Merged

Conversation

BenTheElder
Copy link
Member

EXIT is not handled consistently across shells

this will be obviated by #986 but the need to debug some timeouts is a bit more pressing and this change is tiny ...

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 27, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from amwat and neolit123 May 27, 2020 20:44
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 27, 2020
@@ -37,7 +37,7 @@ install_kind() {
main() {
# create temp dir and setup cleanup
TMP_DIR=$(mktemp -d)
trap cleanup EXIT
trap cleanup INT TERM EXIT
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one should already be doing this because it's bash, but the other script may be ash/dash (it's using sh)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is confusing me regarding trapping EXIT
https://unix.stackexchange.com/a/149093

Copy link
Member Author

@BenTheElder BenTheElder May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also what @Shawn said: Ash and Dash don't trap signals with EXIT.

that's the only relevant part really.

POSIX has requirements about what it looks like when EXIT is called, but when it is called is under specified. we explicitly want these

@aojea
Copy link
Contributor

aojea commented May 27, 2020

/lgtm
it seems worth to try

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 27, 2020
@aojea
Copy link
Contributor

aojea commented May 27, 2020

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2020
@aojea
Copy link
Contributor

aojea commented May 27, 2020

2 jobs timed out without logs :/
#1627 (comment)

@BenTheElder
Copy link
Member Author

/hold cancel
there's other reasons we may not work during timeouts, this change is still correct

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2020
@aojea
Copy link
Contributor

aojea commented May 27, 2020

/retest
then 😄

@BenTheElder
Copy link
Member Author

notably I'm concerned about how podutils / our config is handling these, I think we need a longer grace period.

May 27 21:06:34.545: INFO: Running AfterSuite actions on all nodes

{"component":"entrypoint","file":"prow/entrypoint/run.go:164","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 40m0s timeout","time":"2020-05-27T21:29:25Z"}
{"component":"entrypoint","file":"prow/entrypoint/run.go:245","func":"k8s.io/test-infra/prow/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","time":"2020-05-27T21:29:40Z"}

note those times, 15s is not even enough time for the ginkgo after-suite to finish.
the GCE jobs are on bootstrap.py

@BenTheElder
Copy link
Member Author

/test all

2 similar comments
@BenTheElder
Copy link
Member Author

/test all

@aojea
Copy link
Contributor

aojea commented May 28, 2020

/test all

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2020
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 30, 2020
@BenTheElder
Copy link
Member Author

/test all
seems to be working as intended so far, though we haven't seen the timeout behavior

@aojea
Copy link
Contributor

aojea commented May 30, 2020

/test all
seems to be working as intended so far, though we haven't seen the timeout behavior

Jordan merged a PR related to the test with timeouts, I don't think it will show up again, if I read it correctly the test should timeout now, not the job

@BenTheElder
Copy link
Member Author

timeouts are happening on other branches too.
/test all

@k8s-ci-robot
Copy link
Contributor

@BenTheElder: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kind-e2e-kubernetes-1-18 a66e2bd link /test pull-kind-e2e-kubernetes-1-18
pull-kind-e2e-kubernetes a66e2bd link /test pull-kind-e2e-kubernetes

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@BenTheElder
Copy link
Member Author

At the very least this doesn't misbehave so far, merging so we can maybe start collecting logs elsewhere

@BenTheElder BenTheElder merged commit 9e8816b into kubernetes-sigs:master May 30, 2020
@BenTheElder BenTheElder deleted the all-the-traps branch May 30, 2020 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants