Timeout on probe using container command doesn't work. Will CRI-O fix it. #17340
Labels
component/apps
kind/question
lifecycle/frozen
Indicates that an issue or PR should not be auto-closed due to staleness.
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
priority/P2
I have been told previously long time back that this issue cannot be fixed so long as
docker
is used and so it may not have had an issue created for it. Am creating an issue for it now as want to know whether CRI-O will fix it.The problem is that when you use a container command as a readiness and liveness probe, the
timeout
value for the probe doesn't work. That is, if the probe hangs, or takes a long time to run, then it will not be failed when the timeout expires, nor will the command for the probe be killed.In the case of the probe taking a long time to run, once it does return, normal period between probes will then occur and probe will be run again. If however the probe hangs and never returns, it is never marked as failed, nor will subsequent probes ever run. So although the probe is failing, the pod will never be marked as failed and be restarted.
I have been told this problem can't be fixed because
docker exec
doesn't support a timeout value so that a command can be interrupted.Is this going to be fixed by CRI-O, or are timeouts on probes when using a container command never going to be supported.
RIght now been advising people to implement their own timeout on probe execution in their probe script, or simply avoid container commands for probe scripts.
Version
Steps To Reproduce
Create a liveness probe which uses a container command and have the command run sleep for a long period.
For example:
Current Result
Nothing happens after the default one second timeout on the probe. No event indicating failure of probe. You can get into the container and see the first probe is running:
The process ID for the sleep never changes so is same process and not being killed nor subsequent probe run, at least not until the sleep finishes.
Expected Result
Should register a failure after one second. After two subsequent failures, the pod should be restarted.
Additional Information
None.
The text was updated successfully, but these errors were encountered: