Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: error when trying to initialize libovsdb NB client: no space left on device #372

Open
Dr0p42 opened this issue Nov 24, 2023 · 2 comments

Comments

@Dr0p42
Copy link

Dr0p42 commented Nov 24, 2023

Hello, while upgrading an OKD cluster from 4.11.0-0.okd-2022-10-28-153352 to 4.11.0-0.okd-2022-12-02-145640 I got the following error:

F1124 20:44:04.871535       1 ovnkube.go:133] error when trying to initialize libovsdb NB client: no space left on device

This occurred while there was still a lot of space on the master node. After a lot of testing, I actually tried to check if it was not a storage issue but a port binding issue. And I saw that there was an ovnkube-node already running on the same machine. So I tried to:

  • Delete pod: ovnkube-master
  • Delete pod: ovnkube-node

And the master was able to finally boot and go over that error.

I don't know if there is something doable to update the error message error when trying to initialize libovsdb NB client: no space left on device which I find misleading.

I think the log message is being displayed from this line: https://github.com/ovn-org/ovn-kubernetes/blob/ac6820df0b338a246f10f412cd5ec903bd234694/go-controller/cmd/ovnkube/ovnkube.go#L486

But I see that the code is just printing the error as is. So I guess if something can be done it might be in this repo this is why I am opening it here.

I can provide more logs if needed.
Best,
Maxime

@halfcrazy
Copy link
Contributor

This occurred while there was still a lot of space on the master node.

Have you checked inodes? Is pod mount the hostpath or using a pv?

@Dr0p42
Copy link
Author

Dr0p42 commented Nov 27, 2023

This occurred while there was still a lot of space on the master node.

Have you checked inodes? Is pod mount the hostpath or using a pv?

Hello @halfcrazy, the pod is mounting volumes using hostPath not a pv.

  • I updated the command of that specific container in the pod to have it create dummy files in those and it worked
  • I also opened a terminal on the Node and tried to create files and it worked as well.

Those are the volumes and volumeMounts:

volumeMounts:

      volumeMounts:
      # hostPath
        - name: systemd-units
          readOnly: true
          mountPath: /etc/systemd/system
        - name: etc-openvswitch
          mountPath: /etc/openvswitch/
        - name: etc-openvswitch
          mountPath: /etc/ovn/
        - name: var-lib-openvswitch
          mountPath: /var/lib/openvswitch/
        - name: run-openvswitch
          mountPath: /run/openvswitch/
        - name: run-ovn
          mountPath: /run/ovn/
        - name: ovnkube-config
          mountPath: /run/ovnkube-config/
        - name: env-overrides
          mountPath: /env
        - name: ovn-cert
          mountPath: /ovn-cert
        - name: ovn-ca
          mountPath: /ovn-ca
        - name: kube-api-access-qgltv
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount

volumes:

  volumes:
    - name: systemd-units
      hostPath:
        path: /etc/systemd/system
        type: ''
    - name: etc-openvswitch
      hostPath:
        path: /var/lib/ovn/etc
        type: ''
    - name: var-lib-openvswitch
      hostPath:
        path: /var/lib/ovn/data
        type: ''
    - name: run-openvswitch
      hostPath:
        path: /var/run/openvswitch
        type: ''
    - name: run-ovn
      hostPath:
        path: /var/run/ovn
        type: ''
    - name: ovnkube-config
      configMap:
        name: ovnkube-config
        defaultMode: 420
    - name: env-overrides
      configMap:
        name: env-overrides
        defaultMode: 420
        optional: true
    - name: ovn-ca
      configMap:
        name: ovn-ca
        defaultMode: 420
    - name: ovn-cert
      secret:
        secretName: ovn-cert
        defaultMode: 420
    - name: ovn-master-metrics-cert
      secret:
        secretName: ovn-master-metrics-cert
        defaultMode: 420
        optional: true
    - name: kube-api-access-qgltv
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
          - configMap:
              name: openshift-service-ca.crt
              items:
                - key: service-ca.crt
                  path: service-ca.crt
        defaultMode: 420

I also checked crio.conf and /etc/containers/storage.conf and there was nothing very interesting, I mainly wanted to check for a storage limit in the overlayfs but there was nothing interesting.


Regarding the ports it does not seems to make sens either as there are no conflicting ports between ovnkube-master and ovnkube-node. I can share those yaml if you want to.


Would it be possible that the ovnkube-node was using interacting with a file in the hostPath that ovnkube-master is also using? Therefore when I killed ovnkube-master then ovnkube-node that file just got release or something?

I am sorry I really don't know this project well. Let me know if I can share something to you that could help you understand all of this better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants