Skip to content

Conversation

@michaelmckinsey1
Copy link
Contributor

@michaelmckinsey1 michaelmckinsey1 commented Jul 31, 2025

Summary

  • This PR is a feature
  • It does the following:
    • Modifiessrc/common/KernelBase.hpp/cpp to add new kernel attributes for:
      1. MaxLoopDimensions Number of levels in the largest nested loop
      2. MaxPerfectLoopDimensions Number of levels in the largest perfectly nested loop
      3. MaxArrayDimensions Number of dimensions in the highest-dimensionality array.
      4. NumArrays Total number of arrays initialized in the kernel.
      5. BatchSize Number of executions between global synchronization points Decided not to proceed with this. Too difficult to define what this attribute should mean, e.g. can depend on tuning in case of shared_replication. And not sure how attribute would be used. Closest information to this is the Launch feature, for RAJA team-level parallelism.
    • Also adds this information to Caliper.
    • Add these attributes to src/*/*.cpp
    • Adds new kernel information at the request of PAVE team
    • Update docs
    • Possible other attributes:
    • Devise example to prove usefulness of proposed attributes

setComplexity(Complexity::N);

setNestedLoops(0);
setArrayDimensions(1);
Copy link
Member

@MrBurmark MrBurmark Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By Array Dimensions you're basically asking if the problem worked on by the kernel is 1, 2, 3, etc dimensions? How about calling it Dimensions, Dimensionality, or something like that?
There are some kernels where the problem and arrays have multiple differing dimensionalities. For example LTIMES has a 4d loop that goes over 2d and 3d arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rename these to MaxLoopDimensions and MaxArrayDimensions to clarify we are interested in recording the largest dimensionality loop & array. So for LTIMES MaxLoopDimensions=4 and MaxArrayDimensions=3

Copy link
Contributor Author

@michaelmckinsey1 michaelmckinsey1 Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


setNestedLoops(0);
setArrayDimensions(1);
setNumArrays(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases this is simple, but in some cases as mentioned for LTIMES there are arrays of differing dimensionalities. Do you have a good idea of what you want to count here or not?

Copy link
Contributor Author

@michaelmckinsey1 michaelmckinsey1 Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to count all arrays regardless of dimensionality. i.e. if an array is included in the BytesPerRep count, we are going to count it here.

@michaelmckinsey1 michaelmckinsey1 self-assigned this Aug 25, 2025
@MrBurmark
Copy link
Member

Regarding BatchSize are you ultimately trying to count the number of gpu synchronizations? I assume you don't mean omp barriers, though this is something that should be covered by the KernelsPerRep attribute. I this that the number of gpu synchronizations should be the same for all tunings, but we can check.

@michaelmckinsey1
Copy link
Contributor Author

It is possible something like problemDimensionality would be more informative than MaxArrayDimensions. For example, https://github.com/LLNL/RAJAPerf/blob/develop/src/apps/ZONAL_ACCUMULATION_3D.hpp is problemDimensionality=3, but MaxArrayDimensions=1

@michaelmckinsey1
Copy link
Contributor Author

Also algorithmParallelism vs complexity, e.g. Polybench_ATAX and Polybench_MVT are O(n) complexity but algorithmParallelism=O(sqrt(n))

comparison sorts are O(n*lg(n)) but there is only O(n) parallelism

algorithmParallelism - "how parallel the algorithm implementation is in RAJAPerf"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants