Skip to content

Conversation

@amroakmal
Copy link

Summary

  • This PR is a (refactoring, bugfix, feature, something else)
  • It does the following (modify list as needed):
    • Modifies/refactors (class or method) (how?)
    • Fixes (issue number(s))
    • Adds (specific feature) at the request of (project or person)

@amroakmal amroakmal requested a review from MrBurmark August 13, 2025 19:18
@amroakmal amroakmal self-assigned this Aug 13, 2025
@amroakmal amroakmal force-pushed the temp-add-outer-loop-for-kernel branch from 23b5d62 to db20399 Compare August 13, 2025 19:26
});

for (RepIndex_type extra_rep = 0; extra_rep < 5; ++extra_rep) {
temp_count += extra_rep * (extra_rep & 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try adding EMPTY_BODY (https://github.com/LLNL/RAJAPerf/blob/cda42470851fff2b7c8e6a9b5b11ab83f33a5a07/src/basic/EMPTY.hpp#L29) if the compiler optimizes out the loop.

@amroakmal amroakmal force-pushed the temp-add-outer-loop-for-kernel branch from db20399 to 822ce7f Compare August 13, 2025 19:39
@amroakmal amroakmal force-pushed the temp-add-outer-loop-for-kernel branch 3 times, most recently from 17f3f8f to 6cbc640 Compare August 13, 2025 19:47
@amroakmal amroakmal force-pushed the temp-add-outer-loop-for-kernel branch from 6cbc640 to 04510eb Compare August 13, 2025 22:10
if (Row < N && Col < N) \
C[Col + N * Row] = Cs[ty][tx];

constexpr int extra_kernel_reps = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to make a local variable of this and put it into the DATA_SETUP macro

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably best to add this as a member variable of KernelBase so the compiler doesn't know how many extra_kernel_reps at compile time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, for now I added as you advised in the first comment, and will also try to do what you advised in this comment. Thanks

@amroakmal amroakmal force-pushed the temp-add-outer-loop-for-kernel branch from b65aef2 to 001d11e Compare August 14, 2025 15:17
@MrBurmark
Copy link
Member

I expect this approach to cause the tests to fail, so don't worry too much about getting the tests to pass.

@amroakmal amroakmal force-pushed the temp-add-outer-loop-for-kernel branch from 001d11e to dd132de Compare August 14, 2025 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants