Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(containerd): sometimeDeadlineExceeded happend #2069

Closed
wolfdate25 opened this issue Nov 21, 2024 · 2 comments
Closed

bug(containerd): sometimeDeadlineExceeded happend #2069

wolfdate25 opened this issue Nov 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@wolfdate25
Copy link

What happened:
Discovered that the cronjob stops running while in a running state. The following logs were generated in kubelet:

Nov 20 06:50:36 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:50:36.124178    3828 remote_runtime.go:366] "StopContainer from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="60d9681fc1f51cca1dd96c6694b145587c0a0ebd1faea39eed9eb209634bba9e"
Nov 20 06:50:36 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:50:36.124233    3828 kuberuntime_container.go:784] "Container termination failed with gracePeriod" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="hbt/hbt-cronjob-x-28867922-nh7sp" podUID="1db2521d-ce5b-4cf9-94c8-b6cdf7450e4c" containerName="x-test" containerID="containerd://60d9681fc1f51cca1dd96c6694b145587c0a0ebd1faea39eed9eb209634bba9e" gracePeriod=30
Nov 20 06:50:36 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:50:36.124253    3828 kuberuntime_container.go:822] "Kill container failed" err="rpc error: code =DeadlineExceeded desc = context deadline exceeded" pod="hbt/hbt-cronjob-x-28867922-nh7sp" podUID="1db2521d-ce5b-4cf9-94c8-b6cdf7450e4c" containerName="x-test" containerID={"Type":"containerd","ID":"60d9681fc1f51cca1dd96c6694b145587c0a0ebd1faea39eed9eb209634bba9e"}

Nov 20 06:51:22 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:51:22.265655    3828 remote_runtime.go:366] "StopContainer from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="bf0bf19f3e0dde9cbb488a5a0badc5815c95cac630aa4050e664339e1e1be263"
Nov 20 06:51:22 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:51:22.265720    3828 kuberuntime_container.go:784] "Container termination failed with gracePeriod" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="hbt/hbt-cronjob-x-28867982-pnbpf" podUID="78c53ed8-d7bf-4f1a-a0b6-4a22cbb0623b" containerName="x-test" containerID="containerd://bf0bf19f3e0dde9cbb488a5a0badc5815c95cac630aa4050e664339e1e1be263" gracePeriod=30
Nov 20 06:51:22 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:51:22.265745    3828 kuberuntime_container.go:822] "Kill container failed" err="rpc error: code =DeadlineExceeded desc = context deadline exceeded" pod="hbt/hbt-cronjob-x-28867982-pnbpf" podUID="78c53ed8-d7bf-4f1a-a0b6-4a22cbb0623b" containerName="x-test" containerID={"Type":"containerd","ID":"bf0bf19f3e0dde9cbb488a5a0badc5815c95cac630aa4050e664339e1e1be263"}
Nov 20 06:51:22 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:51:22.891077    3828 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpcerror: code = DeadlineExceeded desc = context deadline exceeded" podSandboxID="fa8984ad470a4a35388386e60c3e4dc250761b61f25bc3a0ddced413e677f264"
Nov 20 06:51:22 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:51:22.891134    3828 kuberuntime_manager.go:1389] "Failed to stop sandbox" podSandboxID={"Type":"containerd","ID":"fa8984ad470a4a35388386e60c3e4dc250761b61f25bc3a0ddced413e677f264"}
Nov 20 06:51:22 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:51:22.891189    3828 kubelet.go:2058] [failed to "KillContainer" for "node" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "16bdee7d-159d-4344-b70c-d6cdd133520d" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
Nov 20 06:51:22 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: E1120 06:51:22.891202    3828 pod_workers.go:1298] "Error syncing pod, skipping" err="[failed to \"KillContainer\" for \"node\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\", failed to \"KillPodSandbox\" for \"16bdee7d-159d-4344-b70c-d6cdd133520d\" with KillPodSandboxError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\"]" pod="jenkins/node" podUID="16bdee7d-159d-4344-b70c-d6cdd133520d"
Nov 20 06:51:23 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: I1120 06:51:23.740012    3828 kuberuntime_container.go:779] "Killing container with a grace period" pod="jenkins/node" podUID="16bdee7d-159d-4344-b70c-d6cdd133520d" containerName="node" containerID="containerd://f7067ff850c7940e3c0c963e506810966ada83c1ed4b86760b421b5136734df7" gracePeriod=30
Nov 20 06:51:23 ip-10-130-21-94.ap-northeast-2.compute.internal kubelet[3828]: I1120 06:51:23.743690    3828 status_manager.go:863] "Pod was deleted and then recreated, skipping status update" pod="jenkins/node" oldPodUID="16bdee7d-159d-4344-b70c-d6cdd133520d" podUID="32f2e071-f648-46e7-b00c-ff2b1fc9258f"

When attempting to remove the container using crictl, the following logs were generated by containerd:

Nov 21 01:53:00 ip-10-130-3-113.ap-northeast-2.compute.internal containerd[1111853]: time="2024-11-21T01:53:00.777374390Z" level=info msg="Kill container \"298d33cc227f7cfe87259d109e904d026120e4c74332ed00c80e08648cc050d3\""
Nov 21 01:53:02 ip-10-130-3-113.ap-northeast-2.compute.internal containerd[1111853]: time="2024-11-21T01:53:02.777460445Z" level=error msg="StopContainer for \"298d33cc227f7\" failed" error="rpc error: code = DeadlineExceeded desc = an error occurs during waiting for container \"298d33cc227f7cfe87259d109e904d026120e4c74332ed00c80e08648cc050d3\" to be killed: wait container \"298d33cc227f7cfe87259d109e904d026120e4c74332ed00c80e08648cc050d3\": context deadline exceeded"
Nov 21 01:53:04 ip-10-130-3-113.ap-northeast-2.compute.internal containerd[1111853]: time="2024-11-21T01:53:04.284946616Z" level=error msg="StopPodSandbox for \"f793baa40a74beedd902140d007d16bab953a0c4ae1c8005f9216481f97db1df\" failed" error="rpc error: code = DeadlineExceeded desc = failed to stop container \"a30c5fecddce111da50a3cd3689d6b08f6e6d7e33f2428bebf5a64fbf3d0f22f\": an error occurs during waiting for container \"a30c5fecddce111da50a3cd3689d6b08f6e6d7e33f2428bebf5a64fbf3d0f22f\" to be killed: wait container \"a30c5fecddce111da50a3cd3689d6b08f6e6d7e33f2428bebf5a64fbf3d0f22f\": context deadline exceeded"
Nov 21 01:53:04 ip-10-130-3-113.ap-northeast-2.compute.internal containerd[1111853]: time="2024-11-21T01:53:04.311928723Z" level=info msg="StopContainer for \"a30c5fecddce111da50a3cd3689d6b08f6e6d7e33f2428bebf5a64fbf3d0f22f\" with timeout 180 (s)"
Nov 21 01:53:04 ip-10-130-3-113.ap-northeast-2.compute.internal containerd[1111853]: time="2024-11-21T01:53:04.312501683Z" level=info msg="Skipping the sending of signal terminated to container \"a30c5fecddce111da50a3cd3689d6b08f6e6d7e33f2428bebf5a64fbf3d0f22f\" because a prior stop with timeout>0 request already sent the signal"

What you expected to happen:
Containers should be terminated and created normally without interrupting the cronjob's execution.
How to reproduce it (as minimally and precisely as possible):
Set up a Kubernetes cluster with containerd versions 1.7.22 or 1.7.23
Deploy a cronjob and wait few hours
Observe container termination and creation processes (cronjob lifecycle)
Look for DeadlineExceeded errors in kubelet and containerd logs
Environment:
AWS Region: ap-northeast-2
Instance Type(s): m7i-flex
Cluster Kubernetes version: 1.30
Node Kubernetes version: v1.30.6-eks-94953ac
AMI Version: 1.30.6-20241115

@wolfdate25 wolfdate25 added the bug Something isn't working label Nov 21, 2024
@cartermckinnon
Copy link
Member

Container termination failed with gracePeriod

That looks like an issue with your specific pod, please open a case with AWS support 👍

@cartermckinnon cartermckinnon closed this as not planned Won't fix, can't repro, duplicate, stale Nov 21, 2024
@wolfdate25
Copy link
Author

@cartermckinnon #2070 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants