-
Notifications
You must be signed in to change notification settings - Fork 110
Thread loop optimizations RAJA launch #1949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Hi @LLNL/raja-core, in collaboration with AMD staff we found a key optimization for nested loops within RAJA::launch. |
|
I wonder if we can't use the same policies but with an extra template argument to choose between the global and context versions of the variables. |
|
I don't think it's a problem to keep the old policies and have a lot of alternatives for folks to try. We do need to work on documenting policies better and have a comprehensive cookbook of examples that clearly show the differences between policies choices, including usage, performance, and how to choose. |
|
Do we want to make populating the context variables optional in case we find any overhead there? |
|
Any thoughts on blockIdx and gridDim? |
oh for register heavy kernels? that makes sense |
I see pro and cons, pro - for completeness could be handy, con - takes up more registers. Maybe we can do partial specializations of the launch context or something like that. This may be less common use cases though. |
|
Ya, I'm imagining the context and some policies both having a switch. Then in the loop implementation it checks that if the policy uses the switch then the context must have the same switch. template < bool switch >
struct Policy;
template < bool switch >
struct Context;
template < bool policy_switch, bool context_switch >
void loop(Policy<policy_switch>, Context<context_switch>)
{
static_assert(!policy_switch || (policy_switch && context_switch),
"If policy has switch then context must have switch");
} |
include/RAJA/policy/cuda/launch.hpp
Outdated
| for (int i = ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.thread_id); | ||
| i < len; i += ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.block_dim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If constexpr to get the values based on StoreDim3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 165 above if constexpr (LaunchContextT<LaunchContextPolicy>::hasDim3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to echo that I think this is a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if possible though because ctx is a function parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were able to get the argument type if there is one operator() that is not templated and has at least one argument. If that isn't true then it will use the default.
Co-authored-by: Jason Burmark <[email protected]>
|
@MrBurmark , do you have time to take a look? I think I pushed up the ideas we had yesterday |
| //is pointer a type just pass it through otherwise do give me the operator | ||
| //static error if not a function type | ||
|
|
||
| //template <typename Lambda> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like there might be some cruft here, we should delete before merging
| // - If T is a function pointer, use function_traits<T> directly | ||
| // - Otherwise, assume it is a callable object and use &T::operator() | ||
| template <typename T> | ||
| using lambda_traits = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool
include/RAJA/policy/cuda/launch.hpp
Outdated
| for (int i = ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.thread_id); | ||
| i < len; i += ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.block_dim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to echo that I think this is a good idea
include/RAJA/policy/cuda/launch.hpp
Outdated
| for (int i = ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.thread_id); | ||
| i < len; i += ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.block_dim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if possible though because ctx is a function parameter
include/RAJA/policy/cuda/launch.hpp
Outdated
| ctx.shared_mem_ptr = raja_shmem_ptr; | ||
|
|
||
| RAJA::expt::invoke_body(reduce_params, body, ctx); | ||
| if constexpr (LaunchContextT<LaunchContextPolicy>::hasDim3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use your new type traits here as well? similar to the HIP backend
| template<named_dim DIM, typename Dim3Like> | ||
| RAJA_INLINE RAJA_DEVICE int get_dim(Dim3Like const& d) | ||
| { | ||
| if constexpr (DIM == named_dim::x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MrBurmark @johnbowen42 , is this the usage of if constexpr you all had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I was thinking about doing if constexcpr (LaunchContextType::hasDim3) in the exec functions instead of adding new policies. Also there is already a function to get the named dims from a dim3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh.. that makes sense now
This PR is a collaboration space for exploring optimization within RAJA launch and the loop abstraction