Fix skipped elements processing in vectorized_process #2596

qieqieplus · 2025-02-27T11:56:45Z

Changes:

Replaced the if condition for processing skipped elements with a for loop that correctly distributes the work.

This ensures all elements are processed.

copy-pr-bot · 2025-02-27T11:56:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

achirkin · 2025-02-27T12:07:51Z

Thanks for submitting this pull request! Do you have a reproducer for this? Normally, the presumption is that the number of skipped elements (which is smaller than the alignment requirements) is always smaller than the grid size.

qieqieplus · 2025-02-28T02:48:42Z

Thanks for submitting this pull request! Do you have a reproducer for this? Normally, the presumption is that the number of skipped elements (which is smaller than the alignment requirements) is always smaller than the grid size.

Yes, this is only a theoretical bug (I found this function very useful and used this in another kernel, and my simple unittest broke). In practice, there are always enough threads.

achirkin · 2025-02-28T06:24:55Z

Ok, although I see the point in your suggestion, I'm inclined to close this PR.
Having if there is rather intentional: (1) it tells the compiler that it doesn't need those precious registers for the extra variable, and (2) it tells the human reader that we only access a handful of elements on both ends and do that in a single parallel read.

PS: if you need this coalesced processing functionality, you may also be interested in a more generic version of the same idea at https://github.com/rapidsai/raft/blob/branch-25.04/cpp/include/raft/matrix/detail/linewise_op.cuh

qieqieplus · 2025-02-28T06:34:42Z

Thanks for your reply and suggestion!

Fix skipped elements processing in vectorized_process

8e960d4

qieqieplus requested a review from a team as a code owner February 27, 2025 11:56

github-actions bot added the cpp label Feb 27, 2025

achirkin closed this Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix skipped elements processing in vectorized_process #2596

Fix skipped elements processing in vectorized_process #2596

qieqieplus commented Feb 27, 2025

copy-pr-bot bot commented Feb 27, 2025

achirkin commented Feb 27, 2025

qieqieplus commented Feb 28, 2025

achirkin commented Feb 28, 2025 •

edited

Loading

qieqieplus commented Feb 28, 2025

Fix skipped elements processing in vectorized_process #2596

Fix skipped elements processing in vectorized_process #2596

Conversation

qieqieplus commented Feb 27, 2025

copy-pr-bot bot commented Feb 27, 2025

achirkin commented Feb 27, 2025

qieqieplus commented Feb 28, 2025

achirkin commented Feb 28, 2025 • edited Loading

qieqieplus commented Feb 28, 2025

achirkin commented Feb 28, 2025 •

edited

Loading