Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some commands can hang for very long periods before failing; please consider adding timeout #104

Closed
smithfarm opened this issue Feb 27, 2020 · 9 comments

Comments

@smithfarm
Copy link
Contributor

smithfarm commented Feb 27, 2020

Certain commands, like "cephadm bootstrap" and podman pull can conceivably get stuck and hang for long periods of time, or even "forever".

It's worth considering to precede these commands with timeout.

This is not just for interactive users sitting at a console - the primary purpose is to ensure that CI jobs fail within a reasonable amount of time when commands get stuck.

@sebastian-philipp
Copy link
Contributor

how long is too long? what about deployments with a slow download connection?

smithfarm added a commit to smithfarm/ceph-salt that referenced this issue Feb 27, 2020
@smithfarm
Copy link
Contributor Author

how long is too long? what about deployments with a slow download connection?

The timeout could be configurable, and default to a large number, like one hour.

@sebastian-philipp
Copy link
Contributor

Maybe we can improve this, if we separate the pull from calling bootstrap?

@bk201
Copy link
Contributor

bk201 commented Mar 2, 2020

Maybe we can improve this, if we separate the pull from calling bootstrap?

We already did that. Images are pulled before bootstrap.

@smithfarm
Copy link
Contributor Author

Normally the cephadm bootstrap step completes within 5 minutes for me. But occasionally it just hangs. It just happened again today.

@sebastian-philipp
Copy link
Contributor

we need to fix the underlying cause of this. Please have a look at where cephadm actually hangs.

@smithfarm
Copy link
Contributor Author

we need to fix the underlying cause of this. Please have a look at where cephadm actually hangs.

Indeed. Yesterday I found out that cephadm bootstrap logs to /var/log/ceph/cephadm.log :-)

@sebastian-philipp
Copy link
Contributor

can we close this?

@ricardoasmarques
Copy link
Contributor

I was not able to reproduce this issue.

One situation where ceph-salt commands will hang is when ceph cluster is stoped, and that situation is tracked on a separate issue: #409

If there's any other situation where a timeout is still mission feel free to reopen or submit a separate issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants