Skip to content

Could not initialize class com.nvidia.spark.ml.linalg.JniRAPIDSML #73

@pxLi

Description

@pxLi

ML JNI mvn build is OK, but when try test it w/ spark plugin in a fresh ENV (w/o conda env for build), it would throw error

[2022-05-31T03:28:40.517Z] 22/05/31 03:28:40 WARN TaskSetManager: 
Lost task 6.0 in stage 5.0 (TID 33) (10.233.109.181 executor 0): 
java.lang.UnsatisfiedLinkError: /tmp/librapidsml_jni.so5201224938898577270: 
libarrow_cuda.so.700: cannot open shared object file: 
No such file or directory

the only change we found is a new cmake option when build w/ conda cudf-22.06
previous (before 22.06.00a220530, e.g. 22.06.00a220519)

LINK_LIBRARIES = -Wl,-rpath,/usr/local/cuda/lib64:/root/miniconda3/lib  -Wl,-Bstatic  -lcudart_static  -lcusparse_static  
-lcusolver_static  -lculibos  -llapack_static  -Wl,-Bdynamic  /root/miniconda3/lib/libcudf.so  /usr/local/cuda/lib64/libcublas.so 
 /root/miniconda3/lib/libarrow.so.700.0.0  /root/miniconda3/lib/libarrow_cuda.so.700.0.0  -ldl  -lpthread  
/usr/local/cuda/lib64/libcudart.so  /usr/lib64/libcuda.so  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl

now (22.06.00a220530)

LINK_LIBRARIES = -Wl,-rpath,/usr/local/cuda/lib64:/root/miniconda3/lib  -Wl,-Bstatic  -lcudart_static  -lcusparse_static  
-lcusolver_static  -lculibos  -llapack_static  -Wl,-Bdynamic  /root/miniconda3/lib/libcudf.so  /usr/local/cuda/lib64/libcublas.so 
 /root/miniconda3/lib/libarrow.so.700.0.0  /root/miniconda3/lib/libarrow_cuda.so.700.0.0  -ldl  -lpthread 
/usr/local/cuda/lib64/libcudart.so /usr/lib64/libcuda.so -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl  
-Wl,-rpath-link,/root/miniconda3/lib

which introduced -Wl,-rpath-link,/root/miniconda3/lib in

cudf                      22.06.00a220530 cuda_11_py38_gdcb04704b3_316    rapidsai-nightly
libcudf                   22.06.00a220530 cuda11_gdcb04704b3_316    rapidsai-nightly
arrow-cpp                 7.0.0           py38he106920_7_cuda    conda-forge
arrow-cpp-proc            3.0.0                      cuda    conda-forge
pyarrow                   7.0.0           py38h17143e8_7_cuda    conda-forge

Probably the dependeny tree mess up in latest cudf pkg on conda?
Test using ml JNI artifacts built against cudf packages before 22.06.00a220530 worked fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions