Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Thrust-NVRTC Support #4305

Open
1 task done
lamarrr opened this issue Apr 1, 2025 · 3 comments
Open
1 task done

[FEA]: Thrust-NVRTC Support #4305

lamarrr opened this issue Apr 1, 2025 · 3 comments
Labels
feature request New feature or request.

Comments

@lamarrr
Copy link

lamarrr commented Apr 1, 2025

Is this a duplicate?

Area

Thrust

Is your feature request related to a problem? Please describe.

CUDF is adopting JIT for kernel compilation via JITIFY/NVRTC.

NVRTC unlike NVCC requires that the source files only contain device code. No host code or headers, even if they aren't used. Support for it is not planned either. This has prevented us from adopting JITIFY for our device kernels, as our dependencies don't support this use case either.

We need Thrust to support inclusion from offline-compiled device-only JITIFY code.

Describe the solution you'd like

  • Disable code sections that are not supported in JITIFY code when in JITIFY mode.
  • Use macros to only enable sequential algorithm specializations (i.e. thrust::seq) when in JITIFY mode
  • Prevent inclusion of host headers in JITIFY mode (i.e., pthread.h). Ideally, it should only include cuda/std/* headers
  • Add a CI job specifically for JITIFY compilation tests

Describe alternatives you've considered

Patching thrust to support JITIFY code - Not feasible or practical

Additional context

Here are some of the thrust headers we use in CUDF's device code:

  • thrust/advance.h
  • thrust/distance.h
  • thrust/equal.h
  • thrust/fill.h
  • thrust/find.h
  • thrust/generate.h
  • thrust/iterator/constant_iterator.h
  • thrust/iterator/counting_iterator.h
  • thrust/iterator/discard_iterator.h
  • thrust/iterator/reverse_iterator.h
  • thrust/iterator/transform_input_output_iterator.h
  • thrust/iterator/transform_iterator.h
  • thrust/iterator/transform_output_iterator.h
  • thrust/iterator/zip_iterator.h
  • thrust/limits.h
  • thrust/logical.h
  • thrust/memory.h
  • thrust/mismatch.h
  • thrust/optional.h
  • thrust/pair.h
  • thrust/transform_reduce.h
  • thrust/tuple.h
@lamarrr lamarrr added the feature request New feature or request. label Apr 1, 2025
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 1, 2025
@jrhemstad
Copy link
Collaborator

Thanks @lamarrr.

I think to down scope this a bit, it would helpful if you could go through and do a pass first of looking at what headers and symbols in libcudf can just be replaced with cuda/std equivalents. Everything in cuda/std is already tested with NVRTC today and it will be less work for libcudf to update its usage of thrust:: symbols to cuda::std:: symbols:

For example, without actually looking, I believe all of the following have direct replacements in cuda/std:

  • thrust/tuple.h
  • thrust/pair.h
  • thrust/limits.h
  • thrust/optional.h
  • thrust/advance.h
  • thrust/logical.h
  • thrust/distance.h
  • thrust/mismatch.h
  • thrust/equal.h
  • thrust/fill.h
  • thrust/find.h
  • thrust/generate.h

Naively, I believe all that would really remain would be making Thrust's fancy iterators (or replacements in cuda::) work with NVRTC.

@bernhardmgruber
Copy link
Contributor

bernhardmgruber commented Apr 1, 2025

AFAIK, all Thrust iterators already work under NVRTC since at least CCCL 3.0, see #3676. Furthermore, if you only need Thrust's sequential algorithm implementations, you will be able to get by with just <cuda/std/algorithm>, which is almost finished here: #3741. limits.h, pair.h, tuple.h, optional.h have direct equivalent's in <cuda/std/...> as well.

Once #3741 lands, let's narrow down your list to what's still missing.

@PointKernel
Copy link
Member

do a pass first of looking at what headers and symbols in libcudf can just be replaced with cuda/std equivalents.

Agreed. Most of the issues should be solved if we use the cuda::std:: equivalents. In our offline discussions, @lamarrr noted that the zip iterator is a more complex fancy iterator to handle compared to others. A potential workaround is to use a counting iterator together with a transform iterator instead: https://godbolt.org/z/5fdMqecoP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request.
Projects
Status: Todo
Development

No branches or pull requests

4 participants