Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission Denied Error attempting to start S3 Node Pod #211

Open
AlecAttwood opened this issue Jun 19, 2024 · 11 comments
Open

Permission Denied Error attempting to start S3 Node Pod #211

AlecAttwood opened this issue Jun 19, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@AlecAttwood
Copy link

AlecAttwood commented Jun 19, 2024

/kind bug

When deploying the CSI S3 Driver to an EKS Cluster the node pod is failing to start. Specifically the s3-plugin container is in crash loop backoff due to issues mounting a volume.

What happened?
When deploying the mountpoint-s3-csi-driver helm chart, encountering this error:

Error:
ailed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/proc/1/mounts" to rootfs at "/host/proc/mounts": change mount propagation through procfd: mount /host/proc/mounts (via /proc/self/fd/6), flags: 0x44000: permission denied: unknown
Containers:
  s3-plugin:
    Container ID:  containerd://ac97f14f0d3493be036a3a45728b9738db6e62f1d080a8a6cf936b480f315740
    Image:         public.ecr.aws/mountpoint-s3-csi-driver/aws-mountpoint-s3-csi-driver:v1.6.0
    Image ID:      public.ecr.aws/mountpoint-s3-csi-driver/aws-mountpoint-s3-csi-driver@sha256:4479dd8e8b108ddf64da806c1d955f3c3fabcbc9ef8bbb85d299666ba8a4e4c1
    Port:          9808/TCP
    Host Port:     0/TCP
    Args:
      --endpoint=$(CSI_ENDPOINT)
      --logtostderr
      --v=4
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       StartError
      Message:      failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/proc/3512/mounts" to rootfs at "/host/proc/mounts": change mount propagation through procfd: mount /host/proc/mounts (via /proc/self/fd/6), flags: 0x44000: permission denied: unknown

What you expected to happen?

The Pods start correctly

How to reproduce it (as minimally and precisely as possible)?

Create a new EKS cluster and deploy the moutnpoint-s3-csi-driver helm chart

Anything else we need to know?:
Since this is a permissions issues with the s3 plugin trying to mount a volume on the host I assume it's due to my EKS setup. However I haven't found much available info into potential fixes. Usually issues with mounting specifically in /proc folders are easy, just don't mount to that path since it's locked down, but the s3-plugin mount paths cannot be changed via the values file.


Looking for any potential fixes or things to try. Thanks

Environment

  • Kubernetes version (use kubectl version): v1.29.4-eks-036c24b
  • Driver version: 1.7.0
    Using default values
@AlecAttwood AlecAttwood changed the title Permission Denided Error attemping to start S3 Node Pod Permission Denied Error attempting to start S3 Node Pod Jun 19, 2024
@monthonk
Copy link
Contributor

Hi @AlecAttwood , what underlying operating system are you using for your hosts?

@AlecAttwood
Copy link
Author

Hi @AlecAttwood , what underlying operating system are you using for your hosts?

Amazon linux

sh-4.2$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"

@AlecAttwood
Copy link
Author

AlecAttwood commented Jul 2, 2024

@monthonk I've had a better look, pretty sure our nodes have SELinux policy rules which are blocking the S3 CSI Driver from mounting on the /proc folder. I tried multiple combinations of SELinux config in the charts values.yaml, which didn't help. And from the SELinux audit, changing the seLinux option in the values.yaml, it didn't look like it was applying the securityContext to the pod properly. Can you suggest anything else to try, or potential seLinux configs which might work on a more locked down node?

@unexge unexge self-assigned this Jul 3, 2024
@unexge
Copy link
Contributor

unexge commented Jul 4, 2024

Hey @AlecAttwood I'm trying to reproduce the issue you're having.

I created a new EKS cluster:

$ eksctl create cluster -f mp-csi-testing-cluster-helm.yaml
mp-csi-testing-cluster-helm.yaml
```yaml
# EKS Cluster setup for installing Mountpoint CSI driver via Helm.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: mp-csi-testing-cluster-helm
  region: eu-north-1

iam:
  withOIDC: true
  serviceAccounts:
    - metadata:
        name: s3-csi-driver-sa
        namespace: kube-system
      roleName: eks-s3-csi-driver-role
      roleOnly: true
      attachPolicy:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action:
              - "s3:ListBucket"
            Resource: "arn:aws:s3:::YOUR-BUCKET-NAME"
          - Effect: Allow
            Action:
              - "s3:GetObject"
              - "s3:PutObject"
              - "s3:AbortMultipartUpload"
              - "s3:DeleteObject"
            Resource: "arn:aws:s3:::YOUR-BUCKET-NAME/*"

nodeGroups:
  - name: ng-1
    instanceType: m5.large
    desiredCapacity: 1
```

and installed Mountpoint CSI driver Helm chart:

$ helm install aws-mountpoint-s3-csi-driver aws-mountpoint-s3-csi-driver/aws-mountpoint-s3-csi-driver \
    --namespace kube-system

One thing might not be clear in our installation instructions if you're using IAM Roles for Service Accounts (IRSA) is that, it asks you to create the role only (i.e., without service account):

$ eksctl create iamserviceaccount \
    --name s3-csi-driver-sa \
    --namespace kube-system \
    --cluster $CLUSTER_NAME \
    --attach-policy-arn $ROLE_ARN \
    --approve \
    --role-name $ROLE_NAME \
    --region $REGION \
    --role-only # <-- Here

and our Helm chart creates service account named s3-csi-driver-sa for you, but in order for IRSA to work you need to annotate your service account with your Role ARN. So, you should have:

$ kubectl describe sa s3-csi-driver-sa -n kube-system | rg eks.amazonaws.com/role-arn
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::account:role/eks-s3-csi-driver-role

in s3-csi-driver-sa otherwise you might get permission denied, but that should be from Mountpoint while trying to do ListObjects or something and not should be at the container initialization phase.

I tried with both 1.7.0 (latest) and 1.6.0 versions of CSI driver and also used Amazon Linux as my node OS but couldn't reproduce the issue you're having.

Did you do some other configuration you think it might be related?

Btw, you should be able to override seLinuxOptions by passing --set node.seLinuxOptions.level="..." to Helm. You can see output of manifests without applying:

$ helm install aws-mountpoint-s3-csi-driver aws-mountpoint-s3-csi-driver/aws-mountpoint-s3-csi-driver \
    --namespace kube-system \
    --set node.seLinuxOptions.level="..." \
    --dry-run --debug

@AlecAttwood
Copy link
Author

Hi @unexge,

Did you do some other configuration you think it might be related?

I mentioned above our nodes are hardened with extra CIS hardening, and have extra SELinux policies added. I'm pretty sure this is what causing the issues. The pods never actually start, and there's SELinux audit logs on our nodes which list denied actions when mounting on the /proc/mounts folder.

you should be able to override seLinuxOptions by passing --set node.seLinuxOptions.level="..." to Helm

I did this, then quired the security contexts for all the pods and didn't see anything. Every time I changed the seLinux options and re-deployed, I looked at the SELinux Audit logs and they were exactly the same. Implying that setting the values didn't change anything or didn't apply the security context. Which is weird, I'm not 100% sure if that an issue with the helm chart or with the logging.

I'll continue to investigate, I'm still convinced if I can set the right SELinux permissions it should work, even with our hardened AMI. Are there any other combinations of SELinux config to try? I'm not that familiar with it, I'm sure the default config should be enough, but it's not working in this case.

@unexge
Copy link
Contributor

unexge commented Jul 8, 2024

Hey @AlecAttwood,

I was able to reproduce the issue with /proc/mounts on AL2023 and looking into it.

Looking at the audit log I got from my host:

type=AVC msg=audit(1720166060.453:300): avc:  denied  { mounton } for  pid=2584 comm="runc:[2:INIT]" path="/run/containerd/io.containerd.runtime.v2.task/k8s.io/cd2c4eb5d338fb23f3d3e537bd948aede52f33a0defbc05a5cd19a5c4ef808a8/rootfs/host/proc/mounts" dev="proc" ino=20684 scontext=system_u:system_r:unconfined_service_t:s0 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=file permissive=1

Seems like it fails when runc/containerd tries to mount /proc/mounts which is before any of our container/pod runs (that's probably why when you change SELinux settings you don't see any difference). We probably need to change SELinux settings for runc/containerd (either via some configuration runc/containerd exposes similar to Kubernetes' seLinuxOptions or via SELinux transition policies – though I'm not an expert and not sure if it's possible) to allow them to mount on /proc/mounts.

I'm looking into whether is it possible to get rid of /proc/mounts mount altogether

@AlecAttwood
Copy link
Author

Thanks for looking into it. That audit logs looks almost identical to what I was seeing. On my side we're going to add a temporary SELinux policy on our nodes to allow it to mount on /proc. I'll keep an eye on this issue, and then upgrade the chart when a fix is released. Thanks again.

@unexge unexge added the bug Something isn't working label Jul 30, 2024
@DWS-guy
Copy link

DWS-guy commented Sep 11, 2024

Any updates on this issue? I am experiencing a similar problem with mounting /proc/mounts inside a kind cluster, though I am not using SELinux

@unexge
Copy link
Contributor

unexge commented Sep 12, 2024

@DWS-guy no updates yet unfortunately. Are you getting mount /host/proc/mounts (via /proc/self/fd/6), flags: 0x44000: permission denied: unknown as well?

@DWS-guy
Copy link

DWS-guy commented Sep 12, 2024

@unexge Correct, I am getting that exact error. SELinux is not present on my system

@dannycjones
Copy link
Contributor

Any updates on this issue? I am experiencing a similar problem with mounting /proc/mounts inside a kind cluster, though I am not using SELinux

We are hoping to remove the dependency on /proc/mount at some point, but I have nothing to share at this time.

I'd recommend opening a new bug report with your logs so that we can investigate, although I would note that we don't officially support kind clusters, only open-source Kubernetes (K8S) and Amazon EKS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants