[MISC] Switch dedupe contact sort to use Quadrants bitonic sort by hughperkins · Pull Request #2853 · Genesis-Embodied-AI/genesis-world

hughperkins · 2026-05-27T21:31:19Z

Replaces the inlined 15-stage bitonic compare-exchange schedule in func_clamp_prune_and_sort_contacts_coop (phase 1a) with a one-line call to the new Quadrants subgroup primitive:

my_key, my_idx = qd.simt.subgroup.bitonic_sort_kv_tiled(my_key, my_idx, 5)

The primitive (added in quadrants hp/bitonic-sort-kv) is a @qd.func that inlines at compile time and unrolls the same 15 compare-exchange stages this code used to write inline, so the generated kernel IR is bit-identical to today on CUDA. Net change: ~30 lines of hand-rolled bitonic code removed, the sentinel load + write-back wrapper is unchanged, and the rest of the kernel (clamp + key init + bucket walk + phase 2 + phase 3) is untouched.

log2_size = 5 pins the sort to a 32-lane tile, matching the kernel's hard-coded block_dim = _K = 32. Using the tiled form rather than the bare bitonic_sort_kv(...) wrapper keeps the sort width fixed at 32 even on AMDGPU wave64, where the bare wrapper would otherwise sort across all 64 lanes and mix in garbage from the inactive upper half.

Requires the matching quadrants change to be installed (the public symbol qd.simt.subgroup.bitonic_sort_kv_tiled is added by that PR).

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

I read the CONTRIBUTING document.
I followed the Submitting Code Changes section of CONTRIBUTING document.
I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
I updated the documentation accordingly or no change is needed.
I tested my changes and added instructions on how to test it for reviewers.

I have added tests to cover my changes.
All new and existing tests passed.

…rt_kv_tiled Replaces the inlined 15-stage bitonic compare-exchange schedule in ``func_clamp_prune_and_sort_contacts_coop`` (phase 1a) with a one-line call to the new Quadrants subgroup primitive: my_key, my_idx = qd.simt.subgroup.bitonic_sort_kv_tiled(my_key, my_idx, 5) The primitive (added in quadrants hp/bitonic-sort-kv) is a @qd.func that inlines at compile time and unrolls the same 15 compare-exchange stages this code used to write inline, so the generated kernel IR is bit-identical to today on CUDA. Net change: ~30 lines of hand-rolled bitonic code removed, the sentinel load + write-back wrapper is unchanged, and the rest of the kernel (clamp + key init + bucket walk + phase 2 + phase 3) is untouched. ``log2_size = 5`` pins the sort to a 32-lane tile, matching the kernel's hard-coded ``block_dim = _K = 32``. Using the tiled form rather than the bare ``bitonic_sort_kv(...)`` wrapper keeps the sort width fixed at 32 even on AMDGPU wave64, where the bare wrapper would otherwise sort across all 64 lanes and mix in garbage from the inactive upper half. Requires the matching quadrants change to be installed (the public symbol ``qd.simt.subgroup.bitonic_sort_kv_tiled`` is added by that PR).

7 lines of prose -> 3. Same intent: explain why we use the tiled form with ``log2_size = 5`` rather than the bare ``bitonic_sort_kv`` wrapper.

Pair ``_K = qd.static(32)`` with ``_LOG2_K = qd.static(5)`` at the top of the kernel and pass ``_LOG2_K`` into ``bitonic_sort_kv_tiled``. The relationship between the sort width and ``_K`` is now visible at the binding site instead of being a magic 5 sitting next to ``_K = 32``.

With ``_K`` and ``_LOG2_K`` defined together at the top of the kernel and ``_LOG2_K`` flowing straight into ``bitonic_sort_kv_tiled``, the explainer ("pins the tiled sort to _K lanes on every backend ...") just restates what the names already convey.

``qd.static(32)`` is just the int ``32`` at compile time, so ``_K.bit_length() - 1`` evaluates to ``5`` and keeps _K and _LOG2_K in sync if _K is ever retuned.

``qd.static()`` is a no-op on Python int literals -- it evaluates its argument at compile time, and a plain ``32`` is already a Python compile-time int. Several other Genesis solver files wrap kernel-scope ``BLOCK_DIM`` / ``WARP_SIZE`` / ``_K`` constants this way as a defensive marker, but it doesn't change codegen and the bare ints read more directly.

duburcqa · 2026-06-01T16:20:48Z

-                            if not their_lt_mine and (their_key != my_key or their_idx != my_idx):
-                                my_key = their_key
-                                my_idx = their_idx
+                my_key, my_idx = qd.simt.subgroup.bitonic_sort_kv_tiled(my_key, my_idx, _LOG2_K)


Could we have an API that does not require specifying _LOG2_K? Like this could be the default value, because I guess block_dim can be queried from context no? if not, then we should probably expose it. It should not be that hard, since it is already possible to set it from python.

hughperkins added 6 commits May 27, 2026 12:25

collider: shorten the bitonic-sort comment

ddf6809

7 lines of prose -> 3. Same intent: explain why we use the tiled form with ``log2_size = 5`` rather than the bare ``bitonic_sort_kv`` wrapper.

collider: derive _LOG2_K from _K instead of hardcoding 5

47bbe19

``qd.static(32)`` is just the int ``32`` at compile time, so ``_K.bit_length() - 1`` evaluates to ``5`` and keeps _K and _LOG2_K in sync if _K is ever retuned.

duburcqa reviewed Jun 1, 2026

View reviewed changes

duburcqa approved these changes Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MISC] Switch dedupe contact sort to use Quadrants bitonic sort#2853

[MISC] Switch dedupe contact sort to use Quadrants bitonic sort#2853
hughperkins wants to merge 6 commits into
Genesis-Embodied-AI:mainfrom
hughperkins:hp/use-bitonic-sort-kv

hughperkins commented May 27, 2026

Uh oh!

duburcqa Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hughperkins commented May 27, 2026

Description

Related Issue

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

Uh oh!

duburcqa Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants