Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to install v0.16.3 on minikube #3273

Closed
savitaashture opened this issue Sep 22, 2020 · 13 comments · Fixed by #3342
Closed

Failed to install v0.16.3 on minikube #3273

savitaashture opened this issue Sep 22, 2020 · 13 comments · Fixed by #3342
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@savitaashture
Copy link
Contributor

Expected Behavior

  • Pipeline installation should be success

Actual Behavior

  • After installing pipeline v0.16.3 (kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.16.3/release.yaml) controller and webhook pod fails
kubectl get pods -n tekton-pipelines  -w
NAME                                           READY   STATUS             RESTARTS   AGE
tekton-pipelines-controller-767f44b5f5-92q4n   0/1     CrashLoopBackOff   9          23m
tekton-pipelines-webhook-7f9888f9b-gl457       0/1     CrashLoopBackOff   9          23m

with below error

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  <unknown>            default-scheduler  Successfully assigned tekton-pipelines/tekton-pipelines-webhook-7f9888f9b-gl457 to minikube
  Normal   Pulled     10m (x5 over 11m)    kubelet, minikube  Container image "gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/webhook:v0.16.3@sha256:5087d4022a4688990cf04eee003d76fc736a939b011a62a160c89ae5bd6b7c20" already present on machine
  Normal   Created    10m (x5 over 11m)    kubelet, minikube  Created container webhook
  Warning  Failed     10m (x5 over 11m)    kubelet, minikube  Error: failed to start container "webhook": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied": unknown
  Warning  BackOff    105s (x53 over 11m)  kubelet, minikube  Back-off restarting failed container

Steps to Reproduce the Problem

  1. Install released pipeline kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.16.3/release.yaml
  2. kubectl get pods -n tekton-pipelines

Additional Info

  • Minikube version:
minikube version
minikube version: v1.5.2

  • Kubernetes version:
  kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

  • Tekton Pipeline version:
kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
v0.16.3
@skycyan
Copy link

skycyan commented Sep 23, 2020

This is caused by the security settings in the Pod.

Temporary solution:
Annotate the definition of "securityContext".

Warning:
Any security definition needs to be tested to go online.

@rannox
Copy link

rannox commented Sep 29, 2020

I am experiencing the same problem on kubernetes v1.18.3 with tekton pipelines v0.16.3.
The version v0.14.2 worked fine.

@cten
Copy link

cten commented Oct 5, 2020

@zops can you elaborate on what you mean? Thanks

@daviddyball
Copy link

daviddyball commented Oct 6, 2020

I had this exact issue and had to change the Deployment for both the webhook and the controller to change runAsUser from 1001 -> 65532

kustomization.yaml to patch it

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tekton-pipelines-controller
  namespace: tekton-pipelines
spec:
  template:
    spec:
      containers:
      - name: tekton-pipelines-controller
        securityContext:
          runAsUser: 65532
      securityContext:
        runAsUser: 65532

@vdemeester
Copy link
Member

cc @mattmoor @imjasonh
I see that the images do have "User": "65532",, which makes me think we are actually doing something wrong in the controller.yaml. When did this change to user 65532 ?

Might be worth a bugfix release (0.17.1, and maybe even 0.16.4 😓)

@daviddyball
Copy link

daviddyball commented Oct 6, 2020

@vdemeester I saw that too... the image is running as 65532 but all manifests point to 1001

Edit: I would also point out that this happens for the webhook Deployment as well

@vdemeester vdemeester added this to the Pipelines v0.17 milestone Oct 6, 2020
@vdemeester
Copy link
Member

vdemeester commented Oct 6, 2020

Let's fix this by using 65532 in the deployment (webhook and controller).
/assign

@bobcatfish @sbwsg @afrittoli @pritidesai @imjasonh @dibyom is it worth doing a 0.16.4 ? (my initial though is yes 🙃)
It will be in 0.17.1 for sure 😉

@dibyom
Copy link
Member

dibyom commented Oct 6, 2020

I think we'd need this for Triggers too: tektoncd/triggers#781

@vdemeester
Copy link
Member

I think we'd need this for Triggers too: tektoncd/triggers#781

Yep, indeed 💦

@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

@vdemeester This was probably my bad when I switched things over to the :nonroot base images.

:nonroot has used 65532 since it's inception, but Tekton only moved (relatively) recently (tho still probably a few months back?).

Having just fixed something similar for our rebuild of Contour, I'd love to know if/how we can make this fail on GKE to avoid future issues like this 😬

cc @mikedanese @cjcullen

@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

cc @BenTheElder too since I'd like it to fail on KinD too 😅

@BenTheElder
Copy link

BenTheElder commented Oct 6, 2020

@mattmoor this is a difference between containerd and docker rather than minikube and kind, and has to do with default permissions.

we've already attempted to resolve this upstream. kubernetes-sigs/kind#1331

someone needs to land containerd/cri#1397

@BenTheElder
Copy link

BenTheElder commented Oct 29, 2020

Once containerd/containerd#4669 (I'm carrying forward containerd/cri#1397 after the repo merge and Lantao moving on) lands this will fail on future kind clusters.
EDIT: We ship containerd fixes quickly. But it has to land first... We avoid forks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants