Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA Driver 555.85 with WSL2 returns error 500 "named symbol not found" when running CUDA apps inside container #520

Closed
cliffwoolley opened this issue May 29, 2024 · 2 comments
Assignees

Comments

@cliffwoolley
Copy link

cliffwoolley commented May 29, 2024

NVIDIA drivers 555.xx and newer for Windows have added a library called libnvdxgdmal.so.1 that must be mapped into the container for CUDA to continue working in containers under WSL2.

The nvidia-container-toolkit must be updated in order to add the reference to this new library.

If the library is missing from the container because nvidia-container-toolkit is not updated and 555.xx or newer drivers are used, then CUDA initialization will return error 500 "named symbol not found" (CUDA_ERROR_NOT_FOUND).

This is cited in https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/543186/geforce-grd-55585-feedback-thread-released-52124/3453787/ as bug 4668302, "PyTorch-CUDA Docker not compatible with CUDA 12.5/GRD 555.85".

For Docker Desktop users the fix has been released as part of the Docker Desktop 4.31.0 update.

@cliffwoolley
Copy link
Author

cliffwoolley commented May 29, 2024

This is fixed in nvidia-container-toolkit 1.14.4 and in 1.15.0.

https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/239
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/241
https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/529
https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/534
79acd7a
c1eae0d

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html#packaging-changes

Added detection of libnvdxgdmal.so.1 on WSL2 systems. This library is required for newer driver versions.

  • If you see this symptom using Docker CE on Linux under WSL2, please update your nvidia-container-toolkit to 1.14.4 or newer.
  • If you see this symptom using Docker Desktop, a fix (to upgrade the bundled nvidia-container-toolkit) is in progress; we will reply back here when it is published. Until that fix is ready, if you are using Docker Desktop, please use NVIDIA Driver 552.xx or earlier. Update: A fix for this issue was included in the Docker Desktop 4.31 release. It is recommended that users update to this version.

@elezar
Copy link
Member

elezar commented Jun 25, 2024

Closing this since Docker Desktop 4.31 has been released.

@elezar elezar closed this as completed Jun 25, 2024
@elezar elezar self-assigned this Jun 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants