Add sycl_khr_free_function_commands extension #922

slawekptak · 2025-10-10T12:46:16Z

This is a new, follow-up PR to #644, originally created by John Pennycook. All the future work related to that PR will be continued here. The reason for creating a new PR is that the PR ownership transfer is required.

This extension provides an alternative mechanism for submitting commands to a device via free-functions that require developers to opt-in to the creation of event objects.

It also proposes alternative names for several commands (e.g., launch) and simplifies some concepts (e.g., by removing the need for the nd_range class).

This extension provides an alternative mechanism for submitting commands to a device via free-functions that require developers to opt-in to the creation of event objects. It also proposes alternative names for several commands (e.g., launch) and simplifies some concepts (e.g., by removing the need for the nd_range class).

Previous "0 or more" wording only made sense when reductions could be optionally provided to functions like parallel_for; now that there are dedicated *_reduce functions, at least one reduction is required.

"is" is more consistent with ISO C++ wording.

Co-authored-by: Greg Lueck <[email protected]>

There is no need to constrain T here because T must be device-copyable in order to construct the accessor passed as an argument.

Renaming sycl::nd_item is not a necessary part of the API redesign for submitting work, so it should be moved to its own extension. This will also give us more time to consider the design and naming of any proposed replacement(s), including how they should interact with new functionality proposed in other KHRs.

There are currently no backends that define interop for reductions, so we can remove these functions for now. If we decide later that these functions are necessary, we can release a revision of the KHR.

Co-authored-by: Andrey Alekseenko <[email protected]>

Commands like copy, memcpy, fill, etc are not kernels and so passing a kernel_bundle as a requirement is not meaningful.

Commands like copy, memcpy, fill, etc take their arguments explicitly rather than being captured by a function, and so there is no need to inform the runtime about which accessors are used. If a command uses an accessor, it must have been passed as an argument.

Any accessor passed to a command that will run on the device must have target::device.

These functions are equivalent to the host task submission functions.

adoc/extensions/sycl_khr_free_function_commands.adoc

gmlueck · 2025-10-10T18:38:52Z

adoc/extensions/sycl_khr_free_function_commands.adoc

+namespace sycl::khr {
+
+template <typename... Requirements>
+std::optional<event> event_barrier(const queue& q, const requirements<Requirements...>& reqs = {});


I'd like to reconsider the name of this function. Using the word "event" in the name makes little sense. Neither this function nor command_barrier above directly take any event objects as input. However, both this function and command_barrier do indirectly take event objects via requirements. In fact, both functions have exactly the same semantics regarding the input event(s) -- in both cases the barrier waits until all the events are complete.

Therefore, it is not the event semantics that distinguishes this API from command_barrier. The difference is that command_barrier implicitly waits for all previous commands in the queue to complete, while event_barrier does not.

At one point we had proposed partial_barrier as the name for this function. Can anyone remember why we didn't like that? That seems like a better name to me than event_barrier.

I think partial_barrier sounds good. We can also consider command_event_barrier and event_barrier. These seem more descriptive and reflect the actual behavior.

AFAIK, It did predate the requirements. But now that we have requirement, do we still need both function?

Can we just add a bool to command_barrier like implicit_all or something?

If not indeed partial_barrier sound better.

gmlueck · 2025-10-10T18:56:19Z

adoc/extensions/sycl_khr_free_function_commands.adoc

+
+_Constraints_:
+
+* [code]#Requirements# does not contain a [code]#kernel_bundle#.


John originally proposed in this comment that there should be a constraint here that Requirements does not have any accessors with target target::device. However, this depends on what semantic we want for this launch_host function.

If we expect that launch_host has all the same features as handler::host_task, then we need to allow both target::host_task and target::device accessors because target::device accessors have a special interaction with the interop_handle. However, we have noted in the past that it might be better to split out the interop_handle part of host tasks into a separate API.

If our goal is to provide a cleaner semantic of this launch_host command, then maybe we only want it to support the case where it runs pure host code and eliminate the backend interop part. We could add that later as a separate API. If we go this route, then we do probably want a constraint here that Requirements does not have any accessors with target target::device

I have assumed the standard and interop versions integrated into one, consistent with the handler version, but I can see, that it might be cleaner to separate them. We might then have something like launch_host_interop, making it a separate use case, and consistent with the interop_handle class name. What do you think?

Limit host tasks accessors to target::host_task done here: 166bd46

My 0.002$ is the same for the local_accessor, i should we should restrain this PR to do to much (and it doesn't already a lot). IMO It should be "easy" for people to move to this new KHR, just changing their API call and not their kernel.

If we expect that launch_host has all the same features as handler::host_task

So Yes, I will vote for that.

If one day we want to refactor host_task, to create a new one, we can do that with another API call.

If our goal is to provide a cleaner semantic of this launch_host command,

I think it's our goal too, but maybe not in the PR.

gmlueck · 2025-10-10T18:59:14Z

adoc/extensions/sycl_khr_free_function_commands.adoc

+{note}If an [code]#event_barrier# is submitted with no requirements, then this
+operation may be a no-op.{endnote}
+
+'''


Adding a comment here from #644, so it doesn't get lost ... @aelovikov-intel noted in this comment that we should add an Examples section illustrating how to use buffers and accessors with this new KHR.

gmlueck · 2025-10-10T19:02:14Z

adoc/extensions/sycl_khr_free_function_commands.adoc

+namespace sycl::khr {
+
+template <typename KernelType, typename... Requirements>
+std::optional<event> launch(const queue& q, range<1> r, const requirements<Requirements...>& reqs, const KernelType& k); (1)


Adding a comment from #644, so it doesn't get lost ... @aelovikov-intel proposed in this comment that we should change the type of all the const KernelType& parameters to KernelType&&.

This change is done in DPC++, for the launch_grouped functions (handler-less), and if the kernel copy is required, it is moved if possible. This might be an opportunity to do some testing if needed.

Co-authored-by: Greg Lueck <[email protected]>

tomdeakin · 2025-10-16T16:22:08Z

The WG discussed this, and feel we need a solution for local memory in this KHR.

PeterTh · 2025-10-16T16:44:58Z

Regarding local memory: to me, it seems like the least invasive strategy (as in, it doesn't depend on many other changes) that fits with the current specification of this extension would be using requirements for local accessors - since it's a natural fit with how non-local accessors are proposed to be handled. A future extension for e.g. static work group memory could then make that superfluous where it applies.

Revamp the proposed specification to provide convenience APIs that are similar to CUDA's `cudaEventRecord` and `cudaStreamWaitEvent` because this is the immediate request from our customer. I think we do still want to add a `record_event` property, but I think we could add that separately as part of the KHR being proposed in KhronosGroup/SYCL-Docs#922, or as a separate oneapi extension based on that KHR.

TApplencourt · 2025-10-30T14:32:04Z

Agree with @PeterTh , would like to keep the change of this PR "minimal" so we can merge it and then we can discuss new feature. I want to avoid the feature creep problem. This PR is immensely useful as if, so no need to do everything in one go :)

Pennycook and others added 30 commits October 17, 2024 15:00

Merge branch 'main' into khr_free_function_commands

ebe4dc0

Reword khr_free_function_commands comment

ce08652

Add periods to khr_free_function_commands comments

372bb3b

Add + marks to code blocks containing ...

9747f7a

Require at least 1 reduction in *_reduce functions

2527e90

Previous "0 or more" wording only made sense when reductions could be optionally provided to functions like parallel_for; now that there are dedicated *_reduce functions, at least one reduction is required.

Replace "must be" with "is" in constraints

f63adeb

"is" is more consistent with ISO C++ wording.

Use bulleted list for multiple constraints

47c08f8

Rewrite preconditions for USM copy functions

15fd80a

Fix typo in non-normative note

9c62792

Define kernel object overloads via equivalence

4377549

Clarify dependencies for command_/event_barrier

c24fb13

Clarify that event_barrier can be a no-op

664a912

Add missing invocation constructor

16111b2

Restart numbering at 1 in each synopsis block

88b540a

Replace backticks with [code] environment

2575901

Add + marks to code blocks containing ... again

8fa1ce2

Fix grammar: "is" to "are"

8d79af4

Co-authored-by: Greg Lueck <[email protected]>

Fix formatting of bulleted lists

2ba2394

Fix more instances of "is" that should be "are"

e382dbc

Remove unnecessary device-copyable constraint

a9bdc10

There is no need to constrain T here because T must be device-copyable in order to construct the accessor passed as an argument.

Remove empty issues section

fa8a8f6

Add missing constraints to fill overloads

db380b4

Remove *_reduce functions for kernel objects

d26831a

There are currently no backends that define interop for reductions, so we can remove these functions for now. If we decide later that these functions are necessary, we can release a revision of the KHR.

Fix copy-paste error in launch_task definition

75867f7

Remove unnecessary "is"

151f632

Co-authored-by: Andrey Alekseenko <[email protected]>

Explain potential performance overhead of events

32d11f5

Add no-op note to command_barrier

165d07e

Weaken note about no-op from "is" to "may be"

be56bb7

Pennycook and others added 6 commits June 17, 2025 20:44

Add exposition-only paragraph to function synopses

2afc9f0

Add constraints to limit kernel bundles to kernels

f31095a

Commands like copy, memcpy, fill, etc are not kernels and so passing a kernel_bundle as a requirement is not meaningful.

Add constraints to limit accessor targets

1ee9698

Any accessor passed to a command that will run on the device must have target::device.

Use requirements in free function commands example

547c100

Merge branch 'main' into khr_free_function_commands

896d71f

slawekptak requested review from Pennycook, TApplencourt, gmlueck, jbrodman, keryell, nliber and tomdeakin October 10, 2025 12:46

sycl-issue-bot bot mentioned this pull request Oct 10, 2025

[Spec change] Add sycl_khr_free_function_commands extension KhronosGroup/SYCL-CTS#1146

Open

slawekptak mentioned this pull request Oct 10, 2025

Add sycl_khr_free_function_commands extension #644

Draft

slawekptak marked this pull request as ready for review October 10, 2025 13:15

slawekptak added 2 commits October 10, 2025 13:42

Merge branch 'main' into khr_free_function_commands_new

de8854f

Add launch_host functions

11ed357

These functions are equivalent to the host task submission functions.

gmlueck reviewed Oct 10, 2025

View reviewed changes

slawekptak and others added 3 commits October 13, 2025 13:10

Fix formatting

17c66d9

Apply suggestion from @gmlueck

0b30eb1

Co-authored-by: Greg Lueck <[email protected]>

Limit the accessor target for host tasks to target::host_task

166bd46

KornevNikita mentioned this pull request Oct 23, 2025

Fix free_function_commands test KhronosGroup/SYCL-CTS#1150

Merged

gmlueck mentioned this pull request Oct 31, 2025

[SYCL][Doc] Add spec to reuse an event intel/llvm#20309

Merged

AlexeySachkov mentioned this pull request Nov 21, 2025

[Coverity] Move from a user-supplied l-value reference intel/llvm#20722

Closed


		_Constraints_:

		* [code]#Requirements# does not contain a [code]#kernel_bundle#.

Add sycl_khr_free_function_commands extension #922

Are you sure you want to change the base?

Add sycl_khr_free_function_commands extension #922

Uh oh!

Conversation

slawekptak commented Oct 10, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomdeakin commented Oct 16, 2025

Uh oh!

PeterTh commented Oct 16, 2025

Uh oh!

TApplencourt commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants