Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM metrics undetected #2848

Open
mikelo opened this issue Aug 1, 2022 · 8 comments
Open

OOM metrics undetected #2848

mikelo opened this issue Aug 1, 2022 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@mikelo
Copy link

mikelo commented Aug 1, 2022

What happened:

I wanted to deliberately create a pod that would go "out of memory" but it seems to run fine.
What you expected to happen:
the pod should switch to status "OOMKilled" right after starting

How to reproduce it (as minimally and precisely as possible):

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch deployment metrics-server -n kube-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"metrics-server","args":["--cert-dir=/tmp", "--secure-port=4443", "--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP"]}]}}}}'

apply the following deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: oomkilled
spec:
  replicas: 1
  selector:
    matchLabels:
      app: oomkilled
  template:
    metadata:
      labels:
        app: oomkilled
    spec:
      containers:
      - image: gcr.io/google-containers/stress:v1
        name: stress
        command: [ "/stress"]
        args: 
          - "--mem-total"
          - "104858000"
          - "--logtostderr"
          - "--mem-alloc-size"
          - "10000000"
        resources:
          requests:
            memory: 1Mi
            cpu: 5m
          limits:
            memory: 20Mi

Anything else we need to know?:

Environment:

  • kind version: (use kind version):
    kind v0.14.0 go1.18.2 linux/amd64

  • Kubernetes version: (use kubectl version):
    Client Version: v1.24.3
    Kustomize Version: v4.5.4
    Server Version: v1.25.0-alpha.0.881+7c127b33dafc53

  • Docker version: (use docker info):
    $ docker version
    Client: Docker Engine - Community
    Version: 20.10.17
    API version: 1.41
    Go version: go1.17.11
    Git commit: 100c701
    Built: Mon Jun 6 23:03:59 2022
    OS/Arch: linux/amd64
    Context: default
    Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.17
API version: 1.41 (minimum version 1.12)
Go version: go1.17.11
Git commit: a89b842
Built: Mon Jun 6 23:01:39 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.6
GitCommit: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
runc:
Version: 1.1.2
GitCommit: v1.1.2-0-ga916309
docker-init:
Version: 0.19.0
GitCommit: de40ad0

  • OS (e.g. from /etc/os-release):
    cat /etc/redhat-release
    Fedora release 36 (Thirty Six)
@mikelo mikelo added the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2022
@stmcginnis
Copy link
Contributor

This doesn't seem to be a kind project issue. kind just gets the cluster up and running.

@mikelo mikelo changed the title OOM OOM metrics undetected Aug 1, 2022
@tkrishtop
Copy link

tkrishtop commented Aug 2, 2022

I used an example from the kubernetes docs to exceed a container's memory limit, ran it both on minikube and on kind, and reproduced the issue.

While minikube shows OOMKilled status as expected, kind somehow gets the pod in the Running state.

1/ minikube - oomkilled as expected

$ minikube start --driver=kvm2
😄  minikube v1.25.2 on Fedora 36
- snip -

$ kubectl create namespace mem-example
namespace/mem-example created

$ kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit-2.yaml --namespace=mem-example
pod/memory-demo-2 created

$ kubectl get pods -n mem-example
NAME            READY   STATUS      RESTARTS      AGE
memory-demo-2   0/1     OOMKilled   2 (23s ago)   30s

2/ kind - the pod is running

$ kind create cluster
enabling experimental podman provider
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.24.0) 🖼 
- snip -
 
$ kubectl create namespace mem-example
namespace/mem-example created

$ kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit-2.yaml --namespace=mem-example
pod/memory-demo-2 created

$ kubectl get pods -n mem-example
NAME            READY   STATUS    RESTARTS   AGE
memory-demo-2   1/1     Running   0          12s

@BenTheElder
Copy link
Member

IIRC we have issues with nested resource limits.

@tkrishtop which driver are you using with minikube? VM or docker?

it's worth pointing out that kind inherently shares a host kernel between nodes and the host and some other things also cannot possibly be isolated due to that, at least with Linux as it stands today. E.g. inotify limits, time, binfmt_misc, ...

@BenTheElder
Copy link
Member

Bounding node resources also has an existing open tracking issue. There's not a good answer currently

@BenTheElder
Copy link
Member

xref: #877

@tkrishtop
Copy link

@tkrishtop which driver are you using with minikube? VM or docker?

@BenTheElder kvm2, sorry, I forgot to put it, got the logs updated now.

@tkrishtop
Copy link

minikube on driver=podman results in Error/CrashLoopBackOff (not in OOMKilled as minikube on kvm2):

$ minikube start --driver=podman
😄  minikube v1.25.2 on Fedora 36
✨  Using the podman driver based on user configuration

$ kubectl create namespace mem-example
namespace/mem-example created

$ kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit-2.yaml --namespace=mem-example
pod/memory-demo-2 created

$ kubectl get pods -n mem-example
NAME            READY   STATUS             RESTARTS      AGE
memory-demo-2   0/1     CrashLoopBackOff   3 (16s ago)   64s

$ kubectl logs memory-demo-2 -n mem-example
stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [1] (415) <-- worker 7 got signal 9
stress: WARN: [1] (417) now reaping child worker processes
stress: FAIL: [1] (421) kill error: No such process
stress: FAIL: [1] (451) failed run completed in 0s

$ kubectl get events -n mem-example
LAST SEEN   TYPE      REASON             OBJECT              MESSAGE
6m29s       Normal    Scheduled          pod/memory-demo-2   Successfully assigned mem-example/memory-demo-2 to minikube
6m9s        Normal    Pulling            pod/memory-demo-2   Pulling image "polinux/stress"
6m26s       Normal    Pulled             pod/memory-demo-2   Successfully pulled image "polinux/stress" in 2.749523255s
6m7s        Normal    Created            pod/memory-demo-2   Created container memory-demo-2-ctr
6m7s        Normal    Started            pod/memory-demo-2   Started container memory-demo-2-ctr
6m24s       Normal    Pulled             pod/memory-demo-2   Successfully pulled image "polinux/stress" in 1.251934697s
5m54s       Warning   BackOff            pod/memory-demo-2   Back-off restarting failed container
6m7s        Normal    Pulled             pod/memory-demo-2   Successfully pulled image "polinux/stress" in 1.291489886s

$ kubectl top node
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
minikube   133m         1%     647Mi           2% 

$ kubectl top pods -n mem-example
error: Metrics not available for pod mem-example/memory-demo-2, age: 54m37.283605721s

@mikelo
Copy link
Author

mikelo commented Aug 3, 2022

$ minikube start --driver=podman --memory=2g --container-runtime=containerd
😄 minikube v1.26.0 on Fedora 36
▪ MINIKUBE_ROOTLESS=true
✨ Using the podman driver based on user configuration
📌 Using rootless Podman driver
👍 Starting control plane node minikube in cluster minikube
🚜 Pulling base image ...
E0803 12:37:39.000031 2062947 cache.go:203] Error downloading kic artifacts: not yet implemented, see issue #8426
🔥 Creating podman container (CPUs=2, Memory=2048MB) ...
📦 Preparing Kubernetes v1.24.1 on containerd 1.6.6 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔗 Configuring CNI (Container Networking Interface) ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ kubectl create namespace mem-example
namespace/mem-example created

$ kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit-2.yaml --namespace=mem-example
pod/memory-demo-2 created

$ k get pod -n mem-example
NAME READY STATUS RESTARTS AGE
memory-demo-2 1/1 Running 0 7m50s

$ kubectl logs -f memory-demo-2 -n mem-example
stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd

$ k top pod -n mem-example
NAME CPU(cores) MEMORY(bytes)
memory-demo-2 487m 99Mi

what is your container-runtime? I am not understanding why we don't have the same results... I'm probably missing something fundamental... the only way I can repro is with driver kvm2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants