Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd versus dockerd WORKDIR non-root permissions #1331

Open
michaelbannister opened this issue Feb 13, 2020 · 24 comments
Open

containerd versus dockerd WORKDIR non-root permissions #1331

michaelbannister opened this issue Feb 13, 2020 · 24 comments
Assignees
Labels
kind/external upstream bugs priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@michaelbannister
Copy link

What happened:
An image whose WORKDIR is set to a directory with permissions only for one user, run in a Pod with securityContext.runAsUser set to a different UID. Kind runs the pod just fine, but Kubernetes fails with an error like failed to create containerd task: OCI runtime create failed: container_linux.go:346: starting container process caused \"chdir to cwd (\\\"/home/nonroot\\\") set in config.json failed: permission denied\": unknown

Kind only fails in the same way as "normal" Kubernetes if the securityContext is also configured to drop all capabilities.

What you expected to happen:
Kind should fail to run the container in the same way as Kubernetes.

How to reproduce it (as minimally and precisely as possible):
See https://github.com/michaelbannister/distroless-permissions-test for a worked example.

Anything else we need to know?:
This came up while working on this Istio PR: istio/istio#20854

Environment:

  • kind version: kind v0.7.0 go1.13 darwin/amd64
  • Kubernetes version: v1.15.5 (Docker for Desktop on macOS)
  • Docker version: 19.03.5
  • OS: macOS 10.15.3

However this also occurs on the testing infrastructure for the Istio project. I don't know the details for that other than that it uses Kind.

@michaelbannister michaelbannister added the kind/bug Categorizes issue or PR as related to a bug. label Feb 13, 2020
@aojea
Copy link
Contributor

aojea commented Feb 13, 2020

/cc @mauilion

@BenTheElder
Copy link
Member

/assign

@BenTheElder
Copy link
Member

Can you define what "kubernetes" is when not KIND? KIND is also kubernetes :+)
It might be containerd vs docker as the node runtime, or something with KIND ...

will investigate O(soon)

@michaelbannister
Copy link
Author

In my case I tested this against the Kubernetes installed by Docker Desktop on macOS. I might be able to try it on GKE, will get back to you…

@michaelbannister
Copy link
Author

GKE v1.14.8-gke.33 with Docker as the container runtime, when running the job defined in job.yaml:

Error: failed to start container "distroless-permissions-test": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied": unknown

@BenTheElder
Copy link
Member

repro job.yaml working in kind locally on my linux workstation.

doing some follow up on block device testing issues plauging k8s CI (hopefully fixed now, and they don't work at all in other local clusters .. 🙃 ) ref: #1248 then back to this one ...

I see some upstream bugs related to this but they appear to be fixed before 1.15

@michaelbannister
Copy link
Author

Sorry, just to be sure: when you say working - you mean it's refusing to start the container in the way I described? Or that it is running the container in kind?

@michaelbannister
Copy link
Author

Feels like it could be something to do with Linux capabilities? (which I barely understand TBH)

@BenTheElder
Copy link
Member

er it is creating the container and the job exits success with kubectl apply -f https://raw.githubusercontent.com/michaelbannister/distroless-permissions-test/master/job.yaml

@michaelbannister
Copy link
Author

OK, but if you run the same job on GKE it will fail to run with the error I've shown. Ditto if you just ask Docker to run it as a different user: docker run --rm -it -u 1337:1337 michaelbannister/distroless-permissions-test.
This discrepancy doesn't look right, which is why I've raised the issue.

However, Kind's behaviour changes if you drop all capabilities, as in job-drop-caps.yaml – it will not run the container. So I wonder if there is something about capabilities (I was trying to read up on effective, ambient, permitted caps etc but I gave up in confusion).

@BenTheElder
Copy link
Member

BenTheElder commented Feb 13, 2020 via email

@BenTheElder
Copy link
Member

BenTheElder commented Feb 13, 2020 via email

@BenTheElder
Copy link
Member

on GKE with a COS containerd 1.13.12-gke.25 node pool kubectl apply -f https://raw.githubusercontent.com/michaelbannister/distroless-permissions-test/master/job.yaml also results in:

bentheelder@cloudshell:~ (bentheelder-kind-dev)$ kubectl get po
NAME                                READY   STATUS      RESTARTS   AGE
distroless-permissions-test-z87qq   0/1     Completed   0          6s

@BenTheElder
Copy link
Member

looked at this with @Random-Liu a bit just now, possibly a containerd bug?

@BenTheElder BenTheElder changed the title Kind runs containers which Kubernetes disallows containerd versus dockerd WORKDIR non-root permissions Feb 14, 2020
@BenTheElder BenTheElder added kind/external upstream bugs and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 14, 2020
@BenTheElder BenTheElder added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Feb 14, 2020
@BenTheElder
Copy link
Member

If this becomes a major issue we can work back in dockerd support for the nodes, but I think this is some subtle difference involving either a bug in containerd/containerd-cri or docker/dockershim and we should probably get it fixed upstream.

@BenTheElder
Copy link
Member

per @Random-Liu appears to be a difference in default capability list (?)

@BenTheElder
Copy link
Member

Fix pending in containerd/cri#1397

We'll pull that into kind once it merges in containerd.

@devadigar
Copy link

Hi @BenTheElder, when I tried to install my application on KIND cluster, application doesn't come up because it's failing with permission denied on '/opt' directory. Application uses the non-root user.
Not sure, the issue which I am running into is related to this ticket.

Is your pending fix will address this issue?Any suggestion on this. please, let me know

@BenTheElder
Copy link
Member

Hi, can you tell me more about your application setup?

On most hosts /opt is owned by root, I would not expect a non-root user to be able to write to a system directory like this, in which case the issue is with your application deployment, not kind/containerd/....

For example on my workstation:

$ stat /opt
  File: /opt
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fd01h/64769d    Inode: 2223873     Links: 8
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-01-06 16:10:50.162169909 -0800
Modify: 2019-04-03 11:03:01.281234979 -0700
Change: 2019-04-03 11:03:01.281234979 -0700
 Birth: -

@BenTheElder
Copy link
Member

(also please use a new support type ticket for this, thanks!)

@BenTheElder
Copy link
Member

upstream patch is LGTM but not merged yet. still monitoring

@BenTheElder
Copy link
Member

further discussion notes that the default capabilities have changed over time in container runtimes and may change again. while cri-containerd would prefer to match dockerd and does consider this a bug, explicit capabilities should be preferred when capabilities are needed.

we're already regularly upgrading containerd to the latest patches against the latest release branch constantly, when this merges there we'll pick it up.

sending another poke there and closing this out to track upstream.

@BenTheElder
Copy link
Member

Follow-up: The PR to containerd/cri fell through (OP moved on to different work), since containerd/cri merged into containerd/containerd I've sent a carry in containerd/containerd#4669.

Once that's in this will actually be fixed.

@BenTheElder BenTheElder reopened this Oct 29, 2020
@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Jan 27, 2021
@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 26, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from k8s-ci-robot Jun 24, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
@BenTheElder BenTheElder reopened this Jun 24, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
@BenTheElder
Copy link
Member

I tried to carry forward the upstream change in containerd/containerd#4669 but it's stuck and I'm running low on bandwidth to keep after this. I think opencontainers/runc#2712 is relevant and may have fixed the issue, I've not had time to really look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/external upstream bugs priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

6 participants