You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I was surprised to see extra calls to cuLibraryLoadData just before the first decompress_page_data range when LIBCUDF_HOST_DECOMPRESSION=AUTO is enabled. This happens even when CUDA_MODULE_LOADING=EAGER.
This load library region does not happen when LIBCUDF_HOST_DECOMPRESSION is unset or LIBCUDF_HOST_DECOMPRESSION=ON. In the PDS benchmark it adds perhaps 25 ms per query.
The library loading seems to be in sorting the blocks I guess SortPairsDescending.
Expected behavior
All the cuLibraryLoadData calls should be in the beginning when CUDA_MODULE_LOADING=EAGER