-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected error from cudaGetDeviceCount() #694
Comments
Can confirm, seeing the same with another RTX 3080 and driver 555.85.
|
Thanks for the heads up. I will pin this issue for now. Maybe an update to the nvidia container toolkit is necessary? |
@AbdBarho reproduced it. Also tried with Local cuda details: Complete log:
|
Hey guys, I have a similar issue to yours. Can confirm that downgrading to 552.44 fixes issue. |
I can confirm that the configuration below can run the system without any issues:
And in
ref:
|
I have updated the containers, please check again from latest master, if the issue still persists, please re-open this issue. |
issue persists |
I just checked the Nvidia driver feedback thread and it's actually a listed known issue:
|
@MarioLiebisch the list of supported versions can be found here: ref: https://pytorch.org/ |
There's a (partial) fix available: NVIDIA/nvidia-container-toolkit#520 |
Docker Desktop 4.31 was released yesterday and includes NVIDIA Container Toolkit 1.15.0, which resolves this issue. |
Ah, very nice. Seems like I barely missed the release when looking for updates earlier yesterday. |
Can confirm, and update to On linux: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html |
I met the same problem, and I have solved it by updating my docker for desktop |
Has this issue been opened before?
Describe the bug
With the current Nvidia Driver from 2024-05-21 (version 555.85) the container does not start with the issue being:
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found
From my research this comes from a version mismatch between the CUDA components. I tried to find an updated version for
FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime
With the CUDA 12.5 version but i couldn't find it yet. My solution was to downgrade my NVIDIA drivers to version 552.44 which made it work again.
Which UI
tested with auto
Hardware / Software
Additional context
I opened this mainly for reference if other people start struggling
The text was updated successfully, but these errors were encountered: