Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix skipped elements processing in vectorized_process #2596

Closed
wants to merge 1 commit into from

Conversation

qieqieplus
Copy link

Changes:

  • Replaced the if condition for processing skipped elements with a for loop that correctly distributes the work.

This ensures all elements are processed.

@qieqieplus qieqieplus requested a review from a team as a code owner February 27, 2025 11:56
Copy link

copy-pr-bot bot commented Feb 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Feb 27, 2025
@achirkin
Copy link
Contributor

Thanks for submitting this pull request! Do you have a reproducer for this? Normally, the presumption is that the number of skipped elements (which is smaller than the alignment requirements) is always smaller than the grid size.

@qieqieplus
Copy link
Author

Thanks for submitting this pull request! Do you have a reproducer for this? Normally, the presumption is that the number of skipped elements (which is smaller than the alignment requirements) is always smaller than the grid size.

Yes, this is only a theoretical bug (I found this function very useful and used this in another kernel, and my simple unittest broke). In practice, there are always enough threads.

@achirkin
Copy link
Contributor

achirkin commented Feb 28, 2025

Ok, although I see the point in your suggestion, I'm inclined to close this PR.
Having if there is rather intentional: (1) it tells the compiler that it doesn't need those precious registers for the extra variable, and (2) it tells the human reader that we only access a handful of elements on both ends and do that in a single parallel read.

PS: if you need this coalesced processing functionality, you may also be interested in a more generic version of the same idea at https://github.com/rapidsai/raft/blob/branch-25.04/cpp/include/raft/matrix/detail/linewise_op.cuh

@achirkin achirkin closed this Feb 28, 2025
@qieqieplus
Copy link
Author

Thanks for your reply and suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants