You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Open MPS on the physical machine. nvidia-cuda-mps-control -d
Start the container. docker run -it --ipc=true --gpus device=7 vllm
Run the script. python3 -m vllm.entrypoints.openai.api_server...
report errors: RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 805: MPS client failed to connect to the MPS control daemon or the MPS server.
Is it supported to open MPS on a physical machine and use MPS to share GPUs in multiple containers?
The text was updated successfully, but these errors were encountered:
I can now use MPS after starting the MPS server in the container. Is the failure to connect to the MPS server on the physical machine because CUDA is not installed on the physical machine?
os:
redhat7.9
docker version:
20.10.21
nvidia-container-runtime:
3.13.0-1
nvidia-container-toolkit:
1.13.5-1
I performed the following steps:
nvidia-cuda-mps-control -d
docker run -it --ipc=true --gpus device=7 vllm
python3 -m vllm.entrypoints.openai.api_server...
report errors:
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 805: MPS client failed to connect to the MPS control daemon or the MPS server.
Is it supported to open MPS on a physical machine and use MPS to share GPUs in multiple containers?
The text was updated successfully, but these errors were encountered: