Timeout on probe using container command doesn't work. Will CRI-O fix it. #17340

GrahamDumpleton · 2017-11-16T01:28:10Z

I have been told previously long time back that this issue cannot be fixed so long as docker is used and so it may not have had an issue created for it. Am creating an issue for it now as want to know whether CRI-O will fix it.

The problem is that when you use a container command as a readiness and liveness probe, the timeout value for the probe doesn't work. That is, if the probe hangs, or takes a long time to run, then it will not be failed when the timeout expires, nor will the command for the probe be killed.

In the case of the probe taking a long time to run, once it does return, normal period between probes will then occur and probe will be run again. If however the probe hangs and never returns, it is never marked as failed, nor will subsequent probes ever run. So although the probe is failing, the pod will never be marked as failed and be restarted.

I have been told this problem can't be fixed because docker exec doesn't support a timeout value so that a command can be interrupted.

Is this going to be fixed by CRI-O, or are timeouts on probes when using a container command never going to be supported.

RIght now been advising people to implement their own timeout on probe execution in their probe script, or simply avoid container commands for probe scripts.

Version

oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth

Server https://api.pro-us-east-1.openshift.com:443
openshift v3.6.173.0.21
kubernetes v1.6.1+5115d708d7

Steps To Reproduce

Create a liveness probe which uses a container command and have the command run sleep for a long period.

For example:

oc set probe dc/blog --liveness -- sleep 300

Current Result

Nothing happens after the default one second timeout on the probe. No event indicating failure of probe. You can get into the container and see the first probe is running:

$ ps aux | grep sleep                                                                                                              
1004820+     49  0.0  0.0   5888   612 ?        Ss   01:18   0:00 sleep 300                                                                        
1004820+     99  0.0  0.0  10648   968 ?        S+   01:21   0:00 grep sleep

The process ID for the sleep never changes so is same process and not being killed nor subsequent probe run, at least not until the sleep finishes.

Expected Result

Should register a failure after one second. After two subsequent failures, the pod should be restarted.

Additional Information

None.

The text was updated successfully, but these errors were encountered:

openshift-bot · 2018-02-25T17:28:39Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

GrahamDumpleton · 2018-02-26T02:44:44Z

/lifecycle frozen

sjenning · 2018-02-27T17:27:01Z

@mrunalp @runcom any information here?

pweil- assigned sjenning Nov 17, 2017

pweil- added component/apps kind/question priority/P2 labels Nov 17, 2017

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2018

openshift-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 26, 2018

sjenning assigned mrunalp and runcom and unassigned sjenning Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout on probe using container command doesn't work. Will CRI-O fix it. #17340

Timeout on probe using container command doesn't work. Will CRI-O fix it. #17340

GrahamDumpleton commented Nov 16, 2017 •

edited

Loading

openshift-bot commented Feb 25, 2018

GrahamDumpleton commented Feb 26, 2018

sjenning commented Feb 27, 2018

Timeout on probe using container command doesn't work. Will CRI-O fix it. #17340

Timeout on probe using container command doesn't work. Will CRI-O fix it. #17340

Comments

GrahamDumpleton commented Nov 16, 2017 • edited Loading

Version

Steps To Reproduce

Current Result

Expected Result

Additional Information

openshift-bot commented Feb 25, 2018

GrahamDumpleton commented Feb 26, 2018

sjenning commented Feb 27, 2018

GrahamDumpleton commented Nov 16, 2017 •

edited

Loading