Skip to content

Conversation

@artv3
Copy link
Member

@artv3 artv3 commented Nov 25, 2025

This PR is a collaboration space for exploring optimization within RAJA launch and the loop abstraction

@artv3 artv3 marked this pull request as ready for review December 2, 2025 17:09
@artv3 artv3 requested a review from a team December 2, 2025 17:09
@artv3
Copy link
Member Author

artv3 commented Dec 2, 2025

Hi @LLNL/raja-core, in collaboration with AMD staff we found a key optimization for nested loops within RAJA::launch.
This would be nice to have for the upcoming release, the downside is that it introduces another set of policies similar to the existing RAJA::hip_thread_loop_{x,y,z} policies. One thought is to retire the old policies in favor of these new ones, but for performance tracking are the old ones worth keeping?

@artv3 artv3 changed the title Exploring loop optimizations RAJA launch Thread loop optimizations RAJA launch Dec 2, 2025
@MrBurmark
Copy link
Member

I wonder if we can't use the same policies but with an extra template argument to choose between the global and context versions of the variables.

@rhornung67
Copy link
Member

I don't think it's a problem to keep the old policies and have a lot of alternatives for folks to try. We do need to work on documenting policies better and have a comprehensive cookbook of examples that clearly show the differences between policies choices, including usage, performance, and how to choose.

@MrBurmark
Copy link
Member

Do we want to make populating the context variables optional in case we find any overhead there?

@MrBurmark
Copy link
Member

Any thoughts on blockIdx and gridDim?

@artv3
Copy link
Member Author

artv3 commented Dec 2, 2025

Do we want to make populating the context variables optional in case we find any overhead there?

oh for register heavy kernels? that makes sense

@artv3
Copy link
Member Author

artv3 commented Dec 2, 2025

Any thoughts on blockIdx and gridDim?

I see pro and cons, pro - for completeness could be handy, con - takes up more registers. Maybe we can do partial specializations of the launch context or something like that. This may be less common use cases though.

@rhornung67 rhornung67 added this to the Dec 2025 Release milestone Dec 2, 2025
@MrBurmark
Copy link
Member

Ya, I'm imagining the context and some policies both having a switch. Then in the loop implementation it checks that if the policy uses the switch then the context must have the same switch.

template < bool switch >
struct Policy;

template < bool switch >
struct Context;

template < bool policy_switch, bool context_switch >
void loop(Policy<policy_switch>, Context<context_switch>)
{
  static_assert(!policy_switch || (policy_switch  && context_switch),
                "If policy has switch then context must have switch");
}

Comment on lines 279 to 280
for (int i = ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.thread_id);
i < len; i += ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.block_dim))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If constexpr to get the values based on StoreDim3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share an example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 165 above if constexpr (LaunchContextT<LaunchContextPolicy>::hasDim3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to echo that I think this is a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if possible though because ctx is a function parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were able to get the argument type if there is one operator() that is not templated and has at least one argument. If that isn't true then it will use the default.

@artv3
Copy link
Member Author

artv3 commented Dec 18, 2025

@MrBurmark , do you have time to take a look? I think I pushed up the ideas we had yesterday

@artv3 artv3 requested a review from a team December 18, 2025 18:40
//is pointer a type just pass it through otherwise do give me the operator
//static error if not a function type

//template <typename Lambda>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there might be some cruft here, we should delete before merging

// - If T is a function pointer, use function_traits<T> directly
// - Otherwise, assume it is a callable object and use &T::operator()
template <typename T>
using lambda_traits =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool

Comment on lines 279 to 280
for (int i = ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.thread_id);
i < len; i += ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.block_dim))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to echo that I think this is a good idea

Comment on lines 279 to 280
for (int i = ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.thread_id);
i < len; i += ::RAJA::internal::CudaDimHelper<DIM>::get(ctx.block_dim))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if possible though because ctx is a function parameter

ctx.shared_mem_ptr = raja_shmem_ptr;

RAJA::expt::invoke_body(reduce_params, body, ctx);
if constexpr (LaunchContextT<LaunchContextPolicy>::hasDim3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use your new type traits here as well? similar to the HIP backend

template<named_dim DIM, typename Dim3Like>
RAJA_INLINE RAJA_DEVICE int get_dim(Dim3Like const& d)
{
if constexpr (DIM == named_dim::x)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrBurmark @johnbowen42 , is this the usage of if constexpr you all had in mind?

Copy link
Member

@MrBurmark MrBurmark Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I was thinking about doing if constexcpr (LaunchContextType::hasDim3) in the exec functions instead of adding new policies. Also there is already a function to get the named dims from a dim3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh.. that makes sense now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants