Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Assertion failure, file index :cufio-udev #346

Open
wyli opened this issue Jul 26, 2022 · 6 comments
Open

[BUG] Assertion failure, file index :cufio-udev #346

wyli opened this issue Jul 26, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@wyli
Copy link

wyli commented Jul 26, 2022

Describe the bug
error when importing cucim:

Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cucim
Assertion failure, file index :cufio-udev  line :134
Aborted (core dumped)

Expected behavior
no error when importing

Environment details (please complete the following information):

  • Environment location: Ubuntu 20.04.4 LTS
    NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7
  • Method of cuCIM install: pip install cucim (cucim-22.6.0)
@wyli wyli added the bug Something isn't working label Jul 26, 2022
@jakirkham
Copy link
Member

Sorry this ran into issues at import.

Does running nvidia-smi work? If so, what does that show?

@wyli
Copy link
Author

wyli commented Jul 27, 2022

nvida-smi works fine, correctly display the gpu info. here is the first row from nvidia-smi NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7

@gigony
Copy link
Contributor

gigony commented Jul 27, 2022

Hi @wyli Thank you for the report

The error seems to be related to GDS (GPUDirect Storage) library that would be installed by default from CUDA 11.7.

Could you please share the output of the following commands?

/usr/local/cuda/gds/tools/gdscheck -v
/usr/local/cuda/gds/tools/gdscheck -p

lspci -tv

If you can find cufile.log file in the current folder, please also share the content.

cucim is imported in the bare metal system (not docker container)?
What happens if you install cucim inside container?

@wyli
Copy link
Author

wyli commented Jul 27, 2022

thanks, I put the logs here:

nvidiauser@ams-monai:~$ /usr/local/cuda/gds/tools/gdscheck -v
Assertion failure, file index :cufio-udev  line :134
Aborted (core dumped)
nvidiauser@ams-monai:~$ /usr/local/cuda/gds/tools/gdscheck -p
 ============
 ENVIRONMENT:
 ============
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe               : Unsupported
 NVMeOF             : Unsupported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Unsupported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 BeeGFS             : Unsupported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Disabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Not configured
 --rdma_device_status  : Up: 0 Down: 0
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_compat_mode : true
 properties.force_compat_mode : false
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_size : 128
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 16384
 properties.max_device_cache_size_kb : 131072
 properties.max_device_pinned_mem_size_kb : 33554432
 properties.posix_pool_slab_size_kb : 4 1024 16384 
 properties.posix_pool_slab_count : 128 64 32 
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 0
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.beegfs.posix_gds_min_kb: 0
 fs.weka.rdma_write_support: false
 profile.nvtx : false
 profile.cufile_stats : 0
 miscellaneous.api_check_aggressive : false
 =========
 GPU INFO:
 =========
 GPU index 0 Tesla T4 bar:1 bar size (MiB):256 supports GDS
 ==============
 PLATFORM INFO:
 ==============
Assertion failure, file index :cufio-udev  line :134
Aborted (core dumped)
nvidiauser@ams-monai:~$ lspci -tv
-+-[b69d:00]---02.0  Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
 +-[0001:00]---00.0  NVIDIA Corporation TU104GL [Tesla T4]
 \-[0000:00]-+-00.0  Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled)
             +-07.0  Intel Corporation 82371AB/EB/MB PIIX4 ISA
             +-07.1  Intel Corporation 82371AB/EB/MB PIIX4 IDE
             +-07.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
             \-08.0  Microsoft Corporation Hyper-V virtual VGA


nvidiauser@ams-monai:~$ cat cufile.log
 26-07-2022 20:51:21:996 [pid=3955 tid=3955] NOTICE  cufio-drv:705 running in compatible mode
 26-07-2022 21:02:43:171 [pid=4944 tid=4944] NOTICE  cufio-drv:705 running in compatible mode
 27-07-2022 17:31:02:141 [pid=2223 tid=2223] NOTICE  cufio-drv:705 running in compatible mode
 27-07-2022 17:31:10:495 [pid=2228 tid=2228] NOTICE  cufio-drv:705 running in compatible mode

@maunzzz
Copy link

maunzzz commented Mar 23, 2023

I ran into the same problem, any solution available yet?

@khyll
Copy link

khyll commented May 21, 2024

I also get this error:

>>import cucim
Assertion failure, file index :cufio-udev  line :134                                                                                                      
Aborted (core dumped)  

I'm running a NV6ads_A10_v5 virtual machine with ubuntu 22.04.

NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2

RAPIDS and cucim installed with conda

Name Version Build Channel

cucim 24.04.00 cuda11_py39_240410_ga24abfd_0 rapidsai
libcucim 24.04.00 cuda11_240410_ga24abfd_0 rapidsai

Any solution to this problem in the pipeline? I really want to run cucim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

5 participants