Skip to content

Conversation

@mxxw
Copy link

@mxxw mxxw commented Sep 10, 2025

Modified RAJAPerf/src/apps/{FIR-Hip.cpp,FIR.hpp}
to use shared/LDS memory in Base_HIP variant
to reduce pressure on vL1D/L2 cache, which
resulted in a > 1.5x speedup under ROCm-6.4.0
for --size 100000000 /* 1E8 */ .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant