-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Labels
Description
ML JNI mvn build is OK, but when try test it w/ spark plugin in a fresh ENV (w/o conda env for build), it would throw error
[2022-05-31T03:28:40.517Z] 22/05/31 03:28:40 WARN TaskSetManager:
Lost task 6.0 in stage 5.0 (TID 33) (10.233.109.181 executor 0):
java.lang.UnsatisfiedLinkError: /tmp/librapidsml_jni.so5201224938898577270:
libarrow_cuda.so.700: cannot open shared object file:
No such file or directory
the only change we found is a new cmake option when build w/ conda cudf-22.06
previous (before 22.06.00a220530, e.g. 22.06.00a220519)
LINK_LIBRARIES = -Wl,-rpath,/usr/local/cuda/lib64:/root/miniconda3/lib -Wl,-Bstatic -lcudart_static -lcusparse_static
-lcusolver_static -lculibos -llapack_static -Wl,-Bdynamic /root/miniconda3/lib/libcudf.so /usr/local/cuda/lib64/libcublas.so
/root/miniconda3/lib/libarrow.so.700.0.0 /root/miniconda3/lib/libarrow_cuda.so.700.0.0 -ldl -lpthread
/usr/local/cuda/lib64/libcudart.so /usr/lib64/libcuda.so -lcudadevrt -lcudart_static -lrt -lpthread -ldl
now (22.06.00a220530)
LINK_LIBRARIES = -Wl,-rpath,/usr/local/cuda/lib64:/root/miniconda3/lib -Wl,-Bstatic -lcudart_static -lcusparse_static
-lcusolver_static -lculibos -llapack_static -Wl,-Bdynamic /root/miniconda3/lib/libcudf.so /usr/local/cuda/lib64/libcublas.so
/root/miniconda3/lib/libarrow.so.700.0.0 /root/miniconda3/lib/libarrow_cuda.so.700.0.0 -ldl -lpthread
/usr/local/cuda/lib64/libcudart.so /usr/lib64/libcuda.so -lcudadevrt -lcudart_static -lrt -lpthread -ldl
-Wl,-rpath-link,/root/miniconda3/lib
which introduced -Wl,-rpath-link,/root/miniconda3/lib
in
cudf 22.06.00a220530 cuda_11_py38_gdcb04704b3_316 rapidsai-nightly
libcudf 22.06.00a220530 cuda11_gdcb04704b3_316 rapidsai-nightly
arrow-cpp 7.0.0 py38he106920_7_cuda conda-forge
arrow-cpp-proc 3.0.0 cuda conda-forge
pyarrow 7.0.0 py38h17143e8_7_cuda conda-forge
Probably the dependeny tree mess up in latest cudf pkg on conda?
Test using ml JNI artifacts built against cudf packages before 22.06.00a220530 worked fine.