Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host decompression #18114

Open
wants to merge 16 commits into
base: branch-25.04
Choose a base branch
from
Open

Host decompression #18114

wants to merge 16 commits into from

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Feb 27, 2025

Description

Add decompression APIs to make the nvCOMP use transparent.
Remove direct dependency on nvCOMP in the ORC and Parquet readers.
Add multi-threaded host-side decompression; currently off by default, can only be enabled via LIBCUDF_HOST_DECOMPRESSION environment variable.

Currently the host compression adds D2H + H2D transfers. Avoiding the extra transfers requires large changes to the readers to avoid device reads of the inputs.

Other changes:

  • Replaced the host compression thread pool with the use of host_worker_pool, which is used for host decompression as well.
  • Fixed logic in use_host_compression.
  • Expanded host compression tests.
  • Expanded error messages in compression.
  • Allocate scratch space in gpu_debrotli to make the API consistent with other decompression types.
  • Added GZIP to compress_max_allowed_chunk_size and compress_max_output_chunk_size to be able to use these functions for host GZIP compression.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Feb 27, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 27, 2025
@vuule vuule added feature request New feature or request non-breaking Non-breaking change labels Feb 27, 2025
@vuule
Copy link
Contributor Author

vuule commented Feb 27, 2025

/ok to test

@github-actions github-actions bot added the CMake CMake build issue label Mar 3, 2025
@vuule
Copy link
Contributor Author

vuule commented Mar 3, 2025

/ok to test

@vuule vuule marked this pull request as ready for review March 3, 2025 21:50
@vuule vuule requested review from a team as code owners March 3, 2025 21:50
@vuule vuule requested review from devavret and lamarrr March 3, 2025 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant