-
Notifications
You must be signed in to change notification settings - Fork 291
AIRUNTIME-171 - cooperative groups scan #5914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
g-h-c
merged 100 commits into
develop
from
users/g-h-c/AIRUNTIME-171_cooperative_groups_scan
Jun 26, 2026
Merged
Changes from all commits
Commits
Show all changes
100 commits
Select commit
Hold shift + click to select a range
249c30a
AIRUNTIME-171 - add initial implemention of cg::inclusive_scan. Only …
g-h-c 198af3c
AIRUNTIME-171 - use __ockl_wfscan_add_i32() when __OPTIMIZE__ is defined
g-h-c 7108afb
AIRUNTIME-171 - implement inclusive_scan for any thread block tile si…
g-h-c bc6b59f
AIRUNTIME-171 - make sure the OCKL intrinsic exist before using it
g-h-c c2f24ed
AIRUNTIME-171 - try to avoid the cost of the conditional check to see…
g-h-c 46430e2
Fix assertion being triggered in __hip_check_mask() for some cooopera…
g-h-c 9713078
AIRUNTIME-171 - make calculateExpected() also calculate expected resu…
g-h-c a21813e
AIRUNTIME-171 - add macro GENERATE_SCAN_FUNC() to generate the overlo…
g-h-c 8a89634
AIRUNTIME-171 - refactor reduce tests so they can be reused for inclu…
g-h-c c1287e5
ROCM-1254 - remove Unit_Thread_Block_Tile_Reduce_Non_Participating_Th…
g-h-c 8fefb10
Remove unused partialSum kernel (dead code with UB pattern)
Copilot cb177dd
AIRUNTIME-171 - make sure the expected result in a reduce is the resu…
g-h-c 2cdf83c
AIRUNTIME-171 - fix that the reduction tree implementation in calcula…
g-h-c 7bd6699
AIRUNTIME-171 - fix that floating point aggregations were not expecti…
g-h-c 6c50a68
AIRUNTIME-171 - iterate the 'modulo' variable until nextPowerOf2(last…
g-h-c 2eb0f0a
AIRUNTIME-171 - fix we were calling compareFloatingPoint() even for l…
g-h-c 063bf5b
AIRUNTIME-171 - add support for cooperative_groups::less in cg::inclu…
g-h-c a809ed4
AIRUNTIME-171 - add Unit_Thread_Block_Tile_Scan_Random_arithmetic
g-h-c 1c2ec54
AIRUNTIME-171 - extend cg::inclusive_scan() to support all types, not…
g-h-c c8c1e02
AIRUNTIME-171 - simplify code that invokes ockl scan intrinsics. Do n…
g-h-c 44ab009
AIRUNTIME-171 - simplify mask generation logic
g-h-c bca8cea
AIRUNTIME-171 - fix up RTC tests compilation
g-h-c 15621ce
AIRUNTIME-171 - add cg::exclusive_scan and some of the associated tests
g-h-c 47af512
AIRUNTIME-171 - refactor duplicated code that calculate the group mask
g-h-c 0240546
AIRUNTIME-171 - fix up previous commits
g-h-c c69a7f4
AIRUNTIME-171 - fix some scope Catch2 INFO() macro would have no effe…
g-h-c 10aae09
AIRUNTIME-171 - fix up calculation of the expected value for cg::bit_xor
g-h-c 6270127
AIRUNTIME-171 - implement cg::exclusive_scan() via a backward permute…
g-h-c 08424e2
AIRUNTIME-171 - add more cg::exclusive_scan() tests
g-h-c f98e7f4
AIRUNTIME-171 - fix up one compareFloatingPoint() missing the Op temp…
g-h-c 92da2b2
AIRUNTIME-171 - add bPermute() implementing backward permutes that ca…
g-h-c b7d08b4
AIRUNTIME-171 - start using bPermute. Remove function isPrimitiveType()
g-h-c b5d1bc0
AIRUNTIME-171 - use ockl functions for __half cooperative_group scans…
g-h-c f5e42d4
AIRUNTIME-171 - fix up add missing tests in cooperativeGrps.yaml. Fix…
g-h-c 4756539
AIRUNTIME-171 - add Unit_Thread_Block_Tile_Scan_Trivially_Copyable_Pa…
g-h-c 81d6d2c
AIRUNTIME-171 - add inclusive/exclusive_scan benchmarks
g-h-c c3a8807
AIRUNTIME-171 - add more information when the output of the reduce be…
g-h-c 4b927e6
AIRUNTIME-171 - remove ExcludeFirst template parameter as all threads…
g-h-c fd1a566
AIRUNTIME-171 - remove mask generation duplicated code in warpSync.cc
g-h-c dddb19f
AIRUNTIME-171 - add Unit_Thread_Block_Tile_Scan_All_Parameter_Sizes
g-h-c b528231
AIRUNTIME-171 - remove outdated comments
g-h-c 4c189e9
AIRUNTIME-171 - fix compilation error: std::memset() is not defined i…
g-h-c 3ce865a
AIRUNTIME-171 - fix duplicated variable
g-h-c cdcb38c
AIRUNTIME-171 - fix integer constant not having the right suffix for …
g-h-c 0af4069
AIRUNTIME-171 - fix use of undeclared variable
g-h-c c8434cc
AIRUNTIME-171 - refactor rtc cooperative groups tests so they can be …
g-h-c f33fdf7
AIRUNTIME-171 - add Unit_Rtc_CoopScan
g-h-c b6b6e6b
AIRUNTIME-171 - use 1ull instead of 1ul when dealing with 64-bit masks
g-h-c 07a1916
AIRUNTIME-171 - 'unsigned long warpMask' should have been 'unsigned l…
g-h-c 8276563
AIRUNTIME-171 - fix another use of 1ul when it should have 1ull. Also…
g-h-c e63e449
AIRUNTIME-171 - fix Unit_Thread_Block_Coalesced_Scan_boolean using ar…
g-h-c 8c0be61
AIRUNTIME-171 - fix GENERATE_SCAN_FUNC() for fp16 when not in coopera…
g-h-c ee60c7b
AIRUNTIME-171 - fix potential deadlock in applyFunctor() and applySca…
g-h-c 89c96c5
AIRUNTIME-171 - add missing semicolon
g-h-c a912a38
AIRUNTIM-171 - Fix another use of 1ul and not 1ull on mask calculations
g-h-c faf2464
AIRUNTIME-171 - fix Unit_Rtc_CoopScan failing because calculateExpect…
g-h-c 1670709
AIRUNTIME-171 - fix OCKL intrinsics not being called for scan
g-h-c 389e2e4
AIRUNTIME-171 - fix OCKL boolean scan intrinsics not being called
g-h-c e41de8e
Add hip_scan.h to hiprtc/CMakeLists.txt
g-h-c d812d73
AIRUNTIME-171 - fix that in cg::exclusive_scan the identity needs to …
g-h-c 1b75779
AIRUNTIME-171 - implement __hip_internal::numeric_limits<T> using con…
g-h-c a2bc885
AIRUNTIME-171 - rename __hip_internal::numeric_limits<T>::max() to Nu…
g-h-c cd88bf8
AIRUNTIME-171 - fix invocation of opToString should have been: opToSt…
g-h-c eaada34
AIRUNTIME-171 - fix struct NumericLimits<float>::lowest() should retu…
g-h-c c396a2b
AIRUNTIME-171 - fix rocsparse 'error: constexpr function never produc…
g-h-c fc40fc5
AIRUNTIME-171 - rename opToString() in rtc_reduce.cc to reduceOpToStr…
g-h-c a5d5206
AIRUNTIME-171 - fix that Unit_Rtc_CoopScan for cg::plus<float> used e…
g-h-c 14f3c41
AIRUNTIME-171 - fix that NumericLimits<__half>::maximum() and lowest(…
g-h-c 1c90b98
AIRUNTIME-171 - fix 'expected' variable assigned but never used
g-h-c 3037743
AIRUNTIME-171 - remove halfWaveSize which is actually shadowed by ano…
g-h-c 53b89c2
AIRUNTIME-171 - fix undefined behaviour that could happen if __builti…
g-h-c 60623df
AIRUNTIME-171 - try to implement NumericLimits<__half>::maximum() and…
g-h-c 70d936b
AIRUNTIME-171 - avoid 'error: change of the active member of a union …
g-h-c 56e3170
AIRUNTIME-171 - fix expected value for __half in exclusive_scans tests
g-h-c eaf2c9d
AIRUNTIME-171 - execute Unit_Thread_Block_Tile_Exclusive_Scan_Basic f…
g-h-c 47921ba
AIRUNTIME-171 - fix up previous commit, std::numeric_limits only need…
g-h-c cc332cd
AIRUNTIME-171 - fix value for NumericLimits<float>::lowest()
g-h-c cfa954b
AIRUNTIME-171 - fix Unit_Rtc_CoopReduce being too verbose
g-h-c e8516db
AIRUNTIME-171 - make sure the operands and the return value are align…
g-h-c 3142b93
AIRUNTIME-171 - prevent memcpy() in cooperative_groups::scan() to rea…
g-h-c 12e99e6
AIRUNTIME-171 - fix memcpy() reading past Val also for reduce operations
g-h-c 4d5bea1
AIRUNTIME-171 - make cooperative_groups::exclusive_scan() results for…
g-h-c 5dd6707
Fix expected values for cg::exlusive_scan() for the first active lane
g-h-c ea17dc2
AIRUNTIME-171 - cosmetic changes
g-h-c cf2f101
AIRUNTIME-171 - rename GENERATE_SCAN_FUNC as HIP_IMPL_GENERATE_SCAN_F…
g-h-c 7a62b59
AIRUNTIME-171 - fix typo in cooperativeGrps.yaml
g-h-c e4461bd
AIRUNTIME-171 - fix Max::operator() contained an unnecessary extra loop
g-h-c ae7d88f
AIRUNTIME-171 - fix whitespace errors
g-h-c a533ae6
AIRUNTIME-171 - add missing nvidia_hip_cooperative_groups_scan.h
g-h-c 041bc79
AIRUNTIME-171 - add cg::inclusive_scan/exclusive_scan() overloads tha…
g-h-c fc4cadf
AIRUNTIME-171 - fix Unit_Thread_Block_Tile_Inclusive_Scan_Basic failu…
g-h-c ec5857e
AIRUNTIME-171 - fix that cooperative_groups/scan.h would not compile …
g-h-c 78e0ee3
AIRUNTIME-171 - add test Unit_Thread_Block_Coalesced_Scan_Partition
g-h-c 4761780
AIRUNTIME-171 - fix that scan should use coalesced_info.member_mask a…
g-h-c 0217805
AIRUNTIME-171 - add a test for partitioned thread_block_tiles too
g-h-c c224faa
AIRUNTIME-171 - fix order of parameters when applying scan operations…
g-h-c 56fcc84
AIRUNTIME-171 - return {} as identity for custom types as the result …
g-h-c 1aabf99
AIRUNTIME-171 - change cooperative_group::impl::buildMask() to use in…
g-h-c 7baca86
SWDEV-515087 - replace __lane_id() with a manual calculation; otherwi…
g-h-c bc62602
ROCM-26483 - make sure cooperative_groups::reduce() will now work wit…
g-h-c File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.