[SYCL][NVPTX] Optimize ID queries when they fit in int #18999

frasercrmck · 2025-06-16T15:48:40Z

The NVPTX target was unable to properly optimize the global ID query, despite the user specifying the -fsycl-id-queries-fit-in-int flag.

This is because, once linked, the compiler sees the global ID builtin as (i64 add (mul (i64 zext i32 A), (i64 zext i32 B), (i64 zext i32 C))). Despite knowing that each of A, B and C are 32-bit values, and the final result fits in a 32-bit value, it is not legal to replace this sequence with (i64 zext (add i32 (mul i32 A, B), C)), which is the ideal code here.

The solution to this problem is a new opt-in 'reflection' in the NVPTX implementation of the global ID builtin, which selects a more optimal version. The driver enables this reflection only when the user passes -fsycl-id-queries-fit-in-int.

The NVPTX target was unable to properly optimize the global ID query, despite the user specifying the -fsycl-id-queries-fit-in-int flag. This is because, once linked, the compiler sees the global ID builtin as (i64 add (mul (i64 zext i32 A), (i64 zext i32 B), (i64 zext i32 C))). Despite knowing that each of A, B and C are 32-bit values, and the final result fits in a 32-bit value, it is not legal to replace this sequence with (i64 zext (add i32 (mul i32 A, B), C)), which is the ideal code here. The solution to this problem is a new opt-in 'reflection' in the NVPTX implementation of the global ID builtin, which selects a more optimal version. The driver enables this reflection only when the user passes -fsycl-id-queries-fit-in-int.

frasercrmck · 2025-06-16T16:02:08Z

With this patch, a SYCL kernel such as

  q.submit([&](sycl::handler& cgh){
    cgh.parallel_for<GetGlobalId>(sycl::nd_range<1>{N, 1024}, [=](sycl::nd_item<1> item){
      unsigned int globalId{static_cast<unsigned int>(item.get_global_id()[0])};
      A[globalId] = globalId;
    });
  });

will be optimized to

.weak .entry _ZTS11GetGlobalId(
.param .u64 .ptr .global .align 4 _ZTS11GetGlobalId_param_0
)
{
.reg .b32 %r<5>;
.reg .b64 %rd<4>;

//
ld.param.b64 %rd1, [_ZTS11GetGlobalId_param_0];
mov.u32 %r1, %ctaid.x;
mov.u32 %r2, %ntid.x;
mov.u32 %r3, %tid.x;
mad.lo.s32 %r4, %r2, %r1, %r3;
mul.wide.u32 %rd2, %r4, 4;
add.s64 %rd3, %rd1, %rd2;
st.global.b32 [%rd3], %r4;
ret;
//
}

as opposed to the bad old days (before this patch) where we got

.weak .entry _ZTS11GetGlobalId(
.param .u64 .ptr .global .align 4 _ZTS11GetGlobalId_param_0
)
{
.reg .b32 %r<4>;
.reg .b64 %rd<8>;

//
ld.param.b64 %rd1, [_ZTS11GetGlobalId_param_0];
mov.u32 %r1, %ctaid.x;
mov.u32 %r2, %ntid.x;
mul.wide.u32 %rd2, %r2, %r1;
mov.u32 %r3, %tid.x;
cvt.u64.u32 %rd3, %r3;
add.s64 %rd4, %rd2, %rd3;
and.b64 %rd5, %rd4, 4294967295;
shl.b64 %rd6, %rd5, 2;
add.s64 %rd7, %rd1, %rd6;
st.global.b32 [%rd7], %rd4;
ret;
//
}

ldrumm

And here I was, happy in the state of having forgotten about nvvm_reflect...

mdtoguchi

Looks OK for driver

frasercrmck requested review from a team as code owners June 16, 2025 15:48

frasercrmck requested a review from steffenlarsen June 16, 2025 15:48

frasercrmck had a problem deploying to WindowsCILock June 16, 2025 15:48 — with GitHub Actions Error

fix formatting

1779200

frasercrmck temporarily deployed to WindowsCILock June 16, 2025 15:59 — with GitHub Actions Inactive

ldrumm approved these changes Jun 16, 2025

View reviewed changes

frasercrmck temporarily deployed to WindowsCILock June 16, 2025 17:20 — with GitHub Actions Inactive

frasercrmck had a problem deploying to WindowsCILock June 16, 2025 17:20 — with GitHub Actions Failure

mdtoguchi approved these changes Jun 16, 2025

View reviewed changes

frasercrmck temporarily deployed to WindowsCILock June 17, 2025 05:00 — with GitHub Actions Inactive

steffenlarsen merged commit 2a58675 into intel:sycl Jun 17, 2025
28 of 29 checks passed

frasercrmck deleted the sycl-nvptx-id-queries-fit-in-int branch June 17, 2025 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][NVPTX] Optimize ID queries when they fit in int #18999

[SYCL][NVPTX] Optimize ID queries when they fit in int #18999

frasercrmck commented Jun 16, 2025

Uh oh!

frasercrmck commented Jun 16, 2025

Uh oh!

ldrumm left a comment

Uh oh!

mdtoguchi left a comment

Uh oh!

Uh oh!

[SYCL][NVPTX] Optimize ID queries when they fit in int #18999

[SYCL][NVPTX] Optimize ID queries when they fit in int #18999

Conversation

frasercrmck commented Jun 16, 2025

Uh oh!

frasercrmck commented Jun 16, 2025

Uh oh!

ldrumm left a comment

Choose a reason for hiding this comment

Uh oh!

mdtoguchi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!