-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docker installed with snap] HA Creation - Error: failed to create cluster: failed to copy certificate ca.crt: exit status 1 #724
Comments
2 is invalid in kubeadm IIRC? |
Ah, I hadn't seen that in the documentation yet... this was my first attempt to spin up an HA cluster. I'll try it again with 3 control planes like the example and make sure that works. Assuming that's my problem, it would be nice to validate the configuration and throw a friendly error when people try to do this (assuming I'm not the only one to ever do it). I'm not really a go guy but I may see if I can figure something out and submit a pull request as penance for my not reading the documentation. Thanks! |
yes, sorry about that, I'm digging around trying to find something authoritative on the kubeadm side of things, I can't actually find anything now but I could have sworn that it only supports 3 control planes specifically. IF / when we get that confirmed we should document in the kind docs and validate. If that's not true, then we have some bug here to fix 😅 |
so we should validate this to be odd, but also we probably need to look at getting some docs to clarify this upstream,. the cert copy issue may be unrelated to the number however, I'll circle back to this one as I'm triaging a few other issues.. |
I am pretty sure 2 cp 3w kind cluster worked for me recently. It should
create fine and work, but etcd cannot make decisions. You need 3 cp for
that.
…On Jul 20, 2019 00:13, "Jon Stelly" ***@***.***> wrote:
Ah, I hadn't seen that in the documentation yet... this was my first
attempt to spin up an HA cluster. I'll try it again with 3 control planes
like the example and make sure that works.
Assuming that's my problem, it would be nice to validate the configuration
and throw a friendly error when people try to do this (assuming I'm not the
only one to ever do it). I'm not really a go guy but I may see if I can
figure something out and submit a pull request as penance for my not
reading the documentation.
Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#724?email_source=notifications&email_token=AACRATABVKR36DHGBTHRYPDQAIU7VA5CNFSM4IFJOC32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MY5ZI#issuecomment-513380069>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACRATG4HVUSCDTM26QPV7TQAIU7VANCNFSM4IFJOC3Q>
.
|
I just tried a config with 3 control-planes and got the same error. It is worth noting I might be trying to do too much on my dev machine. I'm using microk8s on this box for my development, I was really just using kind to spin up a quick larger cluster for testing some scaling behavior. But I'll try to dig into this a bit. |
You could be hitting resource limits of the machine. For instance on a 16gb
vm i cannot run more than 3 cps. Yet i do not see copy errors, instead the
4th apiserver starts crashlooping.
|
I tracked it down but will need some guidance on how to fix it correctly if you'd like a pull request. As mentioned, my go skills are near non-existent so if it's easier for one of you to make this change, I won't be offended, hopefully the below helps... TL;DR: I'm running docker from a snap so docker doesn't have access to the host's /tmp directory that kind uses to copy around certs, etc... so the I found some helpful info about snaps and directories here, including this command: I diff --git a/pkg/cluster/internal/create/actions/kubeadmjoin/join.go b/pkg/cluster/internal/create/actions/kubeadmjoin/join.go
index 5283d98..b1b26e2 100644
--- a/pkg/cluster/internal/create/actions/kubeadmjoin/join.go
+++ b/pkg/cluster/internal/create/actions/kubeadmjoin/join.go
@@ -31,7 +31,6 @@ import (
"sigs.k8s.io/kind/pkg/cluster/nodes"
"sigs.k8s.io/kind/pkg/concurrent"
"sigs.k8s.io/kind/pkg/exec"
- "sigs.k8s.io/kind/pkg/fs"
)
// Action implements action for creating the kubeadm join
@@ -145,13 +144,9 @@ func runKubeadmJoinControlPlane(
// creates a temporary folder on the host that should acts as a transit area
// for moving necessary cluster certificates
- tmpDir, err := fs.TempDir("", "")
- if err != nil {
- return err
- }
- defer os.RemoveAll(tmpDir)
+ var tmpDir = "/home/jon/snap/docker/current/tmp"
- err = os.MkdirAll(filepath.Join(tmpDir, "/etcd"), os.ModePerm)
+ var err = os.MkdirAll(filepath.Join(tmpDir, "/etcd"), os.ModePerm)
if err != nil {
return err
}
@@ -170,11 +165,11 @@ func runKubeadmJoinControlPlane(
tmpPath := filepath.Join(tmpDir, fileName)
// copies from bootstrap control plane node to tmp area
if err := controlPlaneHandle.CopyFrom(containerPath, tmpPath); err != nil {
- return errors.Wrapf(err, "failed to copy certificate %s", fileName)
+ return errors.Wrapf(err, "failed to copy certificate %s:%s from %s to %s", controlPlaneHandle, fileName, containerPath, tmpPath)
}
// copies from tmp area to joining node
if err := node.CopyTo(tmpPath, containerPath); err != nil {
- return errors.Wrapf(err, "failed to copy certificate %s", fileName)
+ return errors.Wrapf(err, "failed to copy certificate %s:%s from %s to %s", node, fileName, tmpPath, containerPath)
}
} |
so ... the problem is that the folder created by |
Yeah, snap's model is to sandbox each installed application unless the snap component is distributed with classic confinement. But docker uses the strict confinement policy.
It feels like /tmp is the correct thing for go's os api to return. And certainly in this situation where kind is asking for a temp directory... I can't imagine a way of changing TempDir() to account for docker (not kind) running in a snap. One easy check to figure this out...
|
"1" seems like the better option to me. |
Ah so, Docker installed via snap is listed under known issues and mentions We don't need to use a temp directory to copy these in the first place and we should consider refactoring that, but in the meantime:
* we have our own method that wraps this to internally deal with a minor oddity of the macOS platform... Note that we do need a tempdir for building node images, so if you do that you will find the same issue there. Otherwise we've avoided this |
i'd close this ticked as the issue is document in https://github.com/kubernetes-sigs/kind/blob/master/site/content/docs/user/known-issues.md#docker-installed-with-snap and extend the issue template to let the users have a look at the list of known issues before reporting. |
We should still prevent and / or detect this on failure, we could avoid the staging tempdir for this usage at least but for the node build we can't. We could start on #39 and detect this in the failure diagnostics and point to the docs automatically... (so it stays out of the "happy path"). I've still be discussing designs for that though... |
Ah, yep, setting TEMPDIR is super simple so I'll just set that up in my scripts... thanks. And yeah, I had looked at the known issues but I was fixating on the error message and the way docker in snap is documented on the Known Issues page it didn't jump out at me. Maybe it would be helpful to add some detail to the error message in this case, like below? In the current form the error doesn't mention the temp path (though the debug output does). Thanks for the quick help on this and feel free to close this issue at your convenience. tmpPath := filepath.Join(tmpDir, fileName)
// copies from bootstrap control plane node to tmp area
if err := controlPlaneHandle.CopyFrom(containerPath, tmpPath); err != nil {
- return errors.Wrapf(err, "failed to copy certificate %s", fileName)
+ return errors.Wrapf(err, "failed to copy certificate %s:%s from %s to %s", controlPlaneHandle, fileName, containerPath, tmpPath)
}
// copies from tmp area to joining node
if err := node.CopyTo(tmpPath, containerPath); err != nil {
- return errors.Wrapf(err, "failed to copy certificate %s", fileName)
+ return errors.Wrapf(err, "failed to copy certificate %s:%s from %s to %s", node, fileName, tmpPath, containerPath)
}
} |
/close I think that logging has improved a lot in newer versions thanks to Ben |
@aojea: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
the tempdir will be gone in the next release #1023 |
What happened:
Running kind to create an HA cluster like the one found here (except with 2 control-planes instead of 3)
What you expected to happen:
Cluster to get created and come up
How to reproduce it (as minimally and precisely as possible):
kind create cluster --retain --loglevel trace --config "./kind-cluster.yaml" --wait 5m;
Anything else we need to know?:
Creating a single control-plane cluster works fine on this machine. Deleted and recreated several times to verify
Debug logging output:
Environment:
kind version
): v0.4.0kubectl version
):docker info
):/etc/os-release
):The text was updated successfully, but these errors were encountered: