Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libnvidia-egl-wayland.so.1.1.13 not found, podman 5.2.2 update #23935

Closed
charles-m-knox opened this issue Sep 11, 2024 · 2 comments
Closed

libnvidia-egl-wayland.so.1.1.13 not found, podman 5.2.2 update #23935

charles-m-knox opened this issue Sep 11, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@charles-m-knox
Copy link

charles-m-knox commented Sep 11, 2024

Issue Description

On my Arch Linux system with a 4090, running podman 5.2.2, after doing an update today the following command fails with an error:

podman run -it -d \
    --pod my-own-pod \
    --gpus all \
    my-own-image:latest

The error:

Error: crun: cannot stat `/usr/lib/libnvidia-egl-wayland.so.1.1.13`: No such file or directory: OCI runtime attempted to invoke a command that was not found

This can be temporarily resolved by creating a symlink to the 1.1.16 version, which does exist in /usr/lib:

cd /usr/lib
sudo ln -s libnvidia-egl-wayland.so.1.1.16 libnvidia-egl-wayland.so.1.1.13

I have also posted this at NVIDIA/nvidia-container-toolkit#692.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Install Arch Linux on bare metal with nvidia drivers, and run pacman -Syu to get the latest of everything. Ensure nvidia-container-toolkit is installed too.
  2. Attempt to run the command I showed above, or probably any other podman command with the --gpus flag

Describe the results you received

Error: crun: cannot stat `/usr/lib/libnvidia-egl-wayland.so.1.1.13`: No such file or directory: OCI runtime attempted to invoke a command that was not found

Describe the results you expected

I should not need to create a symlink to resolve the issue; it should be able to detect the correct .so file.

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.2
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-1:2.1.12-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: e8896631295ccb0bfdda4284f1751be19b483264'
  cpuUtilization:
    idlePercent: 96.9
    systemPercent: 1.38
    userPercent: 1.72
  cpus: 12
  databaseBackend: sqlite
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  freeLocks: 1917
  hostname: "4090"
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.10.9-arch1-2
  linkmode: dynamic
  logDriver: journald
  memFree: 109240987648
  memTotal: 134953447424
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-1
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1
    path: /usr/lib/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.17-1
    path: /usr/bin/crun
    version: |-
      crun version 1.17
      commit: 000fa0d4eeed8938301f3bcf8206405315bc1017
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-2024_09_06.6b38f07-1
    version: |
      pasta 2024_09_06.6b38f07
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.8.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.5
  swapFree: 4294963200
  swapTotal: 4294963200
  uptime: 0h 21m 14.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 42
    paused: 0
    running: 41
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/user/.local/share/containers/storage
  graphRootAllocated: 1999843098624
  graphRootUsed: 1527613612032
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 40
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 5.2.2
  Built: 1724352649
  BuiltTime: Thu Aug 22 11:50:49 2024
  GitCommit: fcee48106a12dd531702d729d17f40f6e152027f
  GoVersion: go1.23.0
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.2

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Arch Linux on bare metal
AMD ryzen CPU
Nvidia 4090

Additional information

Only happens when trying to use the --gpus flag. Also, I am not using wayland - only x11.

@lsm5
Copy link
Member

lsm5 commented Sep 13, 2024

wonder if it's an archlinux packaging issue.

@giuseppe wdyt?

@Luap99
Copy link
Member

Luap99 commented Sep 17, 2024

Under the hood this just maps to the cdi interface where you need to configure the cdi config as per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#procedure

Notice the last sentence

If you change the device or CUDA driver configuration, you must generate a new CDI specification. A configuration change can occur when MIG devices are created or removed, or when the driver is upgraded.

So yes it is expected that you have to regenerate the config

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants